DSpace at EWHA: Crime Prediction using Machine Learning

Browse

My Repository

DSpace at EWHA일반대학원 통계학과 Theses_Master

View : 800 Download: 0

Crime Prediction using Machine Learning

Title: Crime Prediction using Machine Learning

Authors: 오연주

Issue Date: 2018

Department/Major: 대학원 통계학과

Publisher: 이화여자대학교 대학원

Degree: Master

Advisors: 오만숙

Abstract: Crimes are causing many social and economic problems worldwide, and Korea is no exception. In this state, crime prediction has become practically and academically important issue. From a practical viewpoint, crime forecasting can help in making crime-related decisions, such as establishing policies and deciding how to efficiently distribute the limited police resources. Indeed, several studies have proven that an increase in policy patrol where the high crime rate is predicted leads to a substantially low crime rate. In this paper, five machine learning algorithms are applied to predict future crime rates in the Republic of Korea. Machine learning methods are proven useful in finding patterns from the time series data in various fields. However, despite its predictive ability, not many studies have been performed to apply machine learning in the crime domain. To apply machine learning, time series are reconstructed so the problem can be approached as a supervised learning problem. Then, the linear regression, K-nearest neighbors, random forest, gradient boosting and support vector regression are applied and compared with the standard time series ARIMA model. The analysis results show that the machine learning approach outperforms the ARIMA model.;한국 사회에서는 최근 10년간 인구 100,000명당 교통 범죄를 제외한 전체 범죄율이 평균적으로 11.2% 증가했으며, 강력범죄로 분류되는 살인, 강도, 방화, 강간 범죄율은 45.2% 증가하였다. 범죄율이 증가하는 상황에서 미래 범죄 수를 예측하는 문제는 학문적 뿐만 아니라 실용적인 이유로도 중요하다. 실용적인 측면에서 보면 범죄 수를 예측하는 것은 범죄와 관련된 정책을 수립하는 데 도움이 될 수 있으며 무엇보다 제한된 경찰력을 효율적으로 활용할 수 있다는 점에서 효용가치를 가진다. 실제로 몇몇 연구들에 의하면 범죄가 자주 일어날 것이라고 예측되는 곳에 경찰 배치 인력을 늘림으로 인해서 범죄율을 낮출 수 있었다. 기존에 범죄율을 예측하는 연구는 시계열 모형인 ARIMA에 치중되어 있었다. 하지만 ARIMA 모형은 설명변수와 종속변수 간의 선형관계를 가정하고, 오차에 대해 등분산 가정을 한다는 점에서 실제 데이터와 동떨어진 부분이 있었다. 이에 본 연구에서는 최근 다양한 분야의 시계열 자료에서 높은 예측력을 보여주고 있는 머신러닝 방법론을 적용하여 한국의 2016년 월 범죄율을 예측해 보았다. 머시러닝 방법론을 적용하기 위해 시계열 자료를 지도학습(Supervised Learning) 데이터의 형태로 변환한 후 Linear regression, K-nearest neighbors, random forest, gradient boosting, support vector regression 모형을 적용하였고, 테스트 데이터에서의 RMSE(Root Mean Square Error)를 통해 예측력이 가장 좋은 모형을 선택하였다. 모든 분석에서 머신러닝 방법론들은 ARIMA 모형보다 낮은 오류 값을 보였다.