DSpace at EWHA: A GA-based Input Vector Normalization Approach in Back-Propagation Neural Network Modeling for Bankruptcy Prediction

Browse

My Repository

DSpace at EWHA일반대학원 경영학과 Theses_Ph.D

View : 852 Download: 0

A GA-based Input Vector Normalization Approach in Back-Propagation Neural Network Modeling for Bankruptcy Prediction

Title: A GA-based Input Vector Normalization Approach in Back-Propagation Neural Network Modeling for Bankruptcy Prediction

Other Titles: 역전파신경망 모형을 위한 최적화 기반의 입력변수 정규화에 관한 연구 : 부도예측모형을 중심으로

Authors: 태추월

Issue Date: 2011

Department/Major: 대학원 경영학과

Publisher: 이화여자대학교 대학원

Degree: Doctor

Advisors: 신경식

Abstract: The algorithm of the back-propagation neural network (BPN) is considered one of the most appropriate methods for bankruptcy prediction, due to its excellent performance in treating nonlinear data with learning capabilities. Despite its wide application, some major issues must be considered before its use, such as the network topologies, learning parameters, and the normalization methods for the input and output vectors. Previous studies on bankruptcy prediction have shown, however, that many researchers are interested in how to optimize the network topologies and learning parameters to improve the prediction performance. In many cases, the benefits of data normalization are often overlooked. As such, this dissertation emphasizes the enhancement of the BPN performance by optimizing the input data normalization. GA is particularly suitable for multi parameter optimization problems with an objective function subject to numerous hard and soft constraints. They can be used as datamining and knowledge discovery tools for discovering previously unknown patterns. Thus, GA was used to optimize the process of normalization for the improvement of the BPN performance. Two GA-based normalization approaches are presented in this study. The first approach suggests a GA-optimized nonlinear fuzzy normalization. The most representative normalization method for BPN with bankruptcy data is linear scaling, which can reduce the dimensionality of the input space, and prevent information loss from the data. This method has the limitation, however, of not being able to relieve the complicated relationships among the data. An alternative method involves using the fuzzy set theory to normalize the data for the BPN, because the fuzzy membership function can represent the continuous and complicated values as degree of membership values, and allows the representation of the concepts that can be regarded as falling under more than one category. The polynomial-based nonlinear fuzzy membership functions were used to normalize the data within the value of [0, 1]. GA was thus used to find the optimal boundary value of each fuzzy parameter. The second approach proposes a GA-based generalized normalization method. A number of normalization methods have been developed and applied to the ML algorithms, such as min-max, z-score, mean, median, range, decimal, tangent, ordinal, and frequency. Each normalization method has its own strengths and limitations, and there is no universal way of deciding which normalization method works best, on a priori grounds, given the features of the data. As such, in many cases, it is very difficult to determine the optimal normalization method for the proper domain and algorithms. Thus, in this section, a GA-based generalized normalization transform, which is defined as a linearly weighted combination of several normalization techniques, is proposed. GA was used to extract the optimal weight for the combination. Based on the results of the experiments that were conducted, the proposed methods were evaluated and compared with the other benchmark methods to demonstrate their advantages.;인공지능기법을 활용한 부도예측 관련한 연구는 오랫동안 진행되어 왔다. 특히 역전파 신경망은 비선형 학습기법으로써 다른 인공지능기법들 보다 예측 성과가 뛰어나다는 점에서 부도예측모형 개발에 많이 사용되고 있다. 역적파 신경망을 사용하기 전에 필히 고려해야 할 중요한 요소들로는 네트워크 구조, 학습요소, 정규화 방법 등이 있다. 역전파 신경망을 부도예측모형 개발에 적용한 많은 선행 연구들은 역전파 신경망의 성과향상을 위한 네트워크 구조 및 학습요소 최적화와 관련한 많은 방법론들을 제시하고 있으나, 입력변수 정규화와 관련한 연구는 거의 시도된 바가 없는 상황이다. 데이터 정규화는 데이터 전처리(preprocessing)의 한 단계로써 지식기반 시스템 개발을 위한 데이터 분석에 있어서 매우 중요한 역할을 한다. 데이터 정규화 기법을 분류 (classification) 분석에 적용할 경우 학습 성과를 향상시킬 수 있으며 군집(cluster) 분석에 적용할 경우 특별히 큰 데이터가 군집 결과에 미치는 영향을 완화 시킬 수 있다. 정규화 기법들은 각각의 장점과 단점을 갖고 있으며 어떠한 정규화 기법을 사용하느냐에 따라 학습알고리즘의 성과에도 차이가 있다. 또한 하나의 특정 정규화 기법이 다른 기법에 비해 항상 뛰어난 성과를 보여준다는 보장이 없으므로 학습알고리즘의 성과향상을 위해서는 도메인 및 알고리즘 특성에 적합한 정규화 기법을 잘 선택 또는 개발하는 것 역시 매우 중요하다. 특히 부도예측 데이터는 많은 기업들의 재무자료를 사용함으로써 각 독립변수와 종속변수간의 관계가 복잡하고 잡음이 심하므로 신경망 학습에 용이한 형태로 자료를 정규화 시킬 필요가 있다. 따라서 본 연구에서는 입력변수 정규화에 관심을 갖고 유전자알고리즘을 기반으로 하는 정규화 방법론을 제시하고 정규화 기법이 역전파 신경망에 미치는 영향을 고찰해 보고자 한다. 우선 본 연구에서는 유전자 알고리즘을 기반으로 하는 퍼지비선형 정규화 방법론을 제시하였다. 보편적으로 역전파 신경망을 이용한 부도예측 모형에서 가장 대표적으로 사용되고 있는 정규화 기법은 선형(linear) 기법이다. 하지만 선형 정규화 기법은 자료의 범위를 일정하게 조정할 뿐 자료내의 복잡한 관계를 완화시키기에는 한계가 있다. 하지만 퍼지 집합은 어떤 원소가 그 집합에 속한 정도를 나타내고 경계가 애매모호 할 때 그것을 구분하여 주는 역할을 함으로써 이를 입력데이터 정규화에 적용할 경우 입력자료의 복잡성을 완화시킬 수 있다. 따라서 본 연구에서는 비선형 퍼지 소속함수를 이용하여 입력변수를 신경망에 적합한 형태로 정규화 하였으며 유전자 알고리즘을 이용하여 각 퍼지집합의 경계 값을 찾음으로써 최적화된 입력변수 정규화가 이루어지도록 하였다. 다음으로 본 연구에서는 유전자알고리즘을 기반으로 하는 일반화된 정규화 기법을 제시하였다. 정규화 기법에는 여러 가지 기법들이 있으며 각각의 기법들은 자체의 한계점과 장점들을 갖고 있기에 어떤 특정 정규화 기법이 다른 기법들에 비해 항상 뛰어난 성과를 보여준다는 보장이 없다. 또한 최적의 정규화 기법을 선택하는 기준이 없기에 도메인 및 알고리즘 특성에 적합한 정규화 기법을 선택하는 것 역시 어려운 일이다. 따라서 본 연구에서는 이러한 문제점들을 극복하고자 우선 여러 개의 서로 다른 정규화 기법들을 동일 가중치를 두어 일반화 시켰으며 유전자알고리즘을 이용하여 각각의 정규화 기법들에 대한 최적의 가중치를 찾음으로써 최적 또는 최적에 가까운 입력변수 정규화가 이루어 지도록 하였다. 제안한 방법론을 검증하기 위하여 부도예측 데이터를 이용하여 실험을 하였으며 제안하는 방법론과 기존 다른 방법론들간의 비교를 통하여 그 타당성을 검증하였다. 본 연구는 부도예측 데이터와 같이 잡음이 심하고 입력변수와 종속변수간의 관계가 복잡한 데이터에 대한 최적의 정규화가 이루어 지도록 하기 위하여 유전자 알고리즘이 가지는 탐색 및 최적화 능력을 활용함으로써 역전파 신경망의 성과를 향상시켰음에 의의가 있다. 하지만 본 연구는 몇 가지 한계점을 내포하고 있다. 첫째, 본 연구에서 사용한 정규화 기법 외 다른 정규화 기법들에 대한 최적화도 시도해 볼 필요가 있다. 둘째, 본 연구에서 사용한 데이터는 한국 비외감 건설업종 데이터로써 본 연구의 결과는 모든 부도예측 데이터를 위한 일반화된 결과라고 할 수 없다. 향후 연구에서는 본 연구에서 제시한 방법론을 특성이 다른 업종에도 적용시켜 봄으로써 그 타당성을 검증해 볼 필요가 있다.