DSpace at EWHA: Comparison of Classical Classification Methods with Deep Learning Method

Browse

My Repository

DSpace at EWHA일반대학원 통계학과 Theses_Master

View : 890 Download: 0

Comparison of Classical Classification Methods with Deep Learning Method

Title: Comparison of Classical Classification Methods with Deep Learning Method

Authors: 김정윤

Issue Date: 2017

Department/Major: 대학원 통계학과

Publisher: 이화여자대학교 대학원

Degree: Master

Advisors: 이동환

Abstract: Deep learning is method of learning features of data by using multiple processing layers based on neural network. Recently, deep learning is widely used and studied in electronic industry, such as cars and robots, health industry, and energy field etc. after upgrading its algorithm. In statistics, deep learning is usually used as a method to quickly handle big data when analyzing categorical outcome variable. Therefore, packages for deep learning is developed in statistics software R. However, the performance and convenience verification for R packages have scarcely been done. Therefore, this research will solve the classification problem, and compare the five methodologies used in classification (Logistic Regression, Lasso, Linear Discriminant Analysis, Support Vector Machine, Random forest) by using Deepnet, one of the R packages for deep learning. To compare the performance of classification methods, this research calculated misclassification rate by analyzing three real data sets and a simulation study. By using this criteria, Deepnet package showed relatively good performances in the various data sets and the simulation study in classification analysis. It shows that this study can give some guidelines when making a good performance in classification, especially for deep learning users in R.;딥러닝이란 인공 신경망을 기반으로 한 기계 학습이다. 딥러닝은 최근 자동차, 로봇 등 전자산업을 넘어 의료 및 에너지 분야 등 다양한 분야에서 연구되고 있다. 뿐만 아니라, 통계 분야에서 딥러닝은 반응변수가 범주형인 자료를 분석할 때, 대용량의 데이터를 처리할 수 있는 방법으로 많이 활용되고 있다. 따라서 통계 소프트웨어인 R에서도 딥러닝을 위한 다양한 패키지들이 개발되고 있지만, 다양한 R 패키지들의 성능 및 편의성 검증에 관한 연구는 거의 이뤄지지 않고 있다. 따라서, 본 연구는 R의 딥러닝 패키지 중에 하나인 Deepnet을 활용해 분류 문제를 해결함은 물론, 분류분석에서 많이 사용되고 있는 다섯 가지 방법론(Logistic regression, Lasso, Linear Discriminant Analysis, Support Vector Machine, Randomforest)과의 성능을 비교하는데 목적이 있다. 성능 비교를 위해 세 가지의 실제 자료와 시뮬레이션을 진행하여 방법론들에 대한 오분류율을 계산하였다. Deepnet 패키지를 통한 딥러닝 방법은 분류분석 시, 다양한 상황에서 좋은 성능을 보이는 것을 확인할 수 있었다. 따라서 본 논문은 R에서 딥러닝을 사용하고 자 하는 사람들에게 가이드라인을 제시할 수 있을 것이라 생각한다.