DSpace at EWHA: Predicting Gross Box Office Revenue of Domestic Films

Browse

My Repository

DSpace at EWHA일반대학원 통계학과 Theses_Master

View : 681 Download: 0

Predicting Gross Box Office Revenue of Domestic Films

Title: Predicting Gross Box Office Revenue of Domestic Films

Other Titles: 한국 영화 산업의 박스 오피스 매출 예측 모형 연구

Authors: 한수지

Issue Date: 2013

Department/Major: 대학원 통계학과

Publisher: 이화여자대학교 대학원

Degree: Master

Advisors: 송종우

Abstract: This paper targets to predict the gross box office revenue of domestic films using the Korean film data from 2008-2011. The data is analyzed using three Data Mining regression methods, Linear Regression, Random Forest and Gradient Boosting. First, the three regression method used in this paper are shortly discussed and explained for better understanding. Second, relative information for domestic films in the four-year period with revenue size of above KRW 500 million is explored in this paper, and relative variables are generated accordingly. Side by side box plots and other relative measures are considered to evaluate the reasonability and relativity of the chosen variables. Among selected variables, some variables are categorized to carry out more effective analysis and to overcome the barrier of limited data sources. Thus, clustering method is applied to categorize some variables in some number of groups. Third, three regression models are suggested to predict the score of gross box office revenue rather than the revenue itself. In order to do so, the dependent variable, gross box office revenue is grouped and scored by ten percentiles. The most suitable model with the strongest prediction power is selected based on the mean and standard deviation of residual sum of squares of each model. In the chosen model, each variable istested with its p-value at 95 percent significance level. Then lastly, important determinants of domestic box office success are discussed based on the selected regression model. ;본 논문은 한국 영화 산업의 극장 매출에 대한 연구를 하고자 한다. 2000년대에 들어서 한국 영화 산업은 르네상스라고 불릴 만큼 급격한 발전을 보이고 있으며, 앞으로 더욱 더 발전이 기대되는 분야로 일컬어진다. 본 연구에는 2008년부터 2011년 사이의 한국 영화 산업 관련 데이터를 수집하여 Data mining 회귀 모형을 구축하여 영화의 극장 매출을 예측한다. 첫째로, 연구에서 사용된 3가지 회귀 모형인 선형 모형, Random Forest, Gradient Boosting에 대한 이해를 돕기 위해 개념에 대해 간단히 설명 될 것이다. 둘째, 4개년도 한국 영화 데이터 중 총 극장 매출 5억 이상의 영화만을 중심으로 데이터 프레임을 구축하고, 박스 그래프 등을 통하여 데이터에 대한 설명과 선정의 타당성을 논의 할 것이다. 예측 모형을 구축하기 위한 데이터 변수들 중 몇몇 변수들은 데이터 부족의 한계점 극복과 더욱 더 효과적인 모형 구축을 위하여 그룹화 되는 과정을 거친다. 본 과정은 Clustering이라는 Data mining의 군집 분석 방법을 사용한다. 셋째, 본 연구는 매출액 자체에 대한 예측보다 매출액 분포에 대한 10개의 퍼센타일에 점수를 부여하여 점수에 대한 예측을 최종 목표로 한다. 이러한 일련의 과정을 거쳐 자료가 준비되면 마지막으로 3가지 회귀 방법론을 적용시켜 분석 후 모형의 성능을 비교하고 최종 선택된 모형을 바탕으로 한국 영화 산업에 중요한 영향을 미치는 요소들에 대하여 논의해 보도록 할 것이다.