DSpace at EWHA: 자기조직화지도 기반 이상치 탐색을 이용한 인공신경망 기법의 전처리

Browse

My Repository

DSpace at EWHA일반대학원 경영학과 Theses_Master

View : 862 Download: 0

자기조직화지도 기반 이상치 탐색을 이용한 인공신경망 기법의 전처리

Title: 자기조직화지도 기반 이상치 탐색을 이용한 인공신경망 기법의 전처리

Authors: 김가은

Issue Date: 2010

Department/Major: 대학원 경영학과

Publisher: 이화여자대학교 대학원

Degree: Master

Advisors: 신경식

Abstract: 전처리(pre-processing)는 표본데이터의 질을 높이고 분석을 용이하게 하기 위한 데이터 마이닝 과정이다. 이런 전처리 방법의 중요성에 기인하여 현재까지 많은 전처리 방법이 연구되어왔는데, 일반적으로 통계기법들이 이용되었다. 그러나 기존의 통계 기반 전처리 방법은 사용할 수 있는 데이터 분포가 제한 되거나, 여러 변수를 가진 데이터에 사용할 경우 데이터 손실이 일어날 수 있다는 한계점을 가지고 있다. 본 연구에서는 이러한 기존 전처리 방법의 한계를 극복하여 다양한 형태의 데이터 분포에서 사용이 가능하고, 데이터 손실이 비교적 적은 전처리 방법을 제시하였다. 연구 수행의 첫 단계에서, 데이터를 자기조직화지도 (SOM:SelfOrganizing Map)를 이용하여 클러스터링(clustering)시켜준다. 두 번째 단계에서는 자기조직화지도를 통해 생성된 각각의 군집에서 이상치 탐색(outlier detection)방법을 통해 각각의 군집의 중심에서 가장 먼 거리에 있는 데이터들을 제거해준다. 이렇게 이상치가 제거된 데이터를 이용하여 데이터 모델링(data modeling)을 시행한 후, 기존의 전처리 방법과 그 결과를 비교해봄으로써 데이터 전처리 결과의 성능을 검증할 수 있다. 본 연구에서 사용된 데이터는 제조업에 속하는 기업들의 데이터이다. 그 동안 부도예측모형에 대해 연구한 기존의 논문들은 데이터를 통해 좀 더 정확한 예측을 위한 모델링 방법이나 모델링을 위한 변수 선정방법에 초점을 맞춰왔으며 전처리 과정에 대한 연구는 상대적으로 적게 시행되어왔다. 이에 본 연구논문에서는 부도예측모형을 세우는 과정에서 새로운 전처리 방법을 시행하고 그 결과를 비교하여 군집화 이상치 탐색(clustering outlier detection) 방법이 예측 모형의 전처리 과정으로 효과적인 방법임을 제시하고자 한다.;Pre-processing is one of the key steps during data mining to improve the quality of raw-data and enhances the result of expression. Many pre-processing methods have been studied, which is normally based on static analysis due to the importance of the pre-processing. However existing statistical methods have limit range of data distribution or have a possibility to a loss of the data if a variety of variables is used. This study suggests new method which can handle wide range of data and relatively small loss of data to overcome performance of the previous method. In the first step of this study, cluster data using SOM; Self Organizing Map. In the second step, the most off numbered data from each cluster were eliminated using outlier detection based on SOM. The performance of pre-processing can be verified by comparing the results of the existing pre-processing method and result of data modeling using outlier-detection methods. This study used data from companies belong to manufacturing industry. Most of previous papers have studied bankrupt prediction modeling focused on how to handle the data to make more accurate prediction model or method of variable selection for modeling. But studies on pre-processing have not been done. Therefore this study proposes a new way of pre-processing method with the result that the clustering outlier detection is efficient method on pre-processing of the prediction model while building bankrupt predict model.