DSpace at EWHA: 사례기반추론 모형에서 기호 데이터 간의 유사도 측정에 관한 연구

Browse

My Repository

DSpace at EWHA일반대학원 경영학과 Theses_Master

View : 739 Download: 0

사례기반추론 모형에서 기호 데이터 간의 유사도 측정에 관한 연구

Title: 사례기반추론 모형에서 기호 데이터 간의 유사도 측정에 관한 연구

Authors: 이연님

Issue Date: 2003

Department/Major: 대학원 경영학과

Publisher: 이화여자대학교 대학원

Degree: Master

Abstract: 사례기반추론은 과거의 어떤 문제를 해결하기 위해 사용했던 경험을 바탕으로 하여 주어진 새로운 문제를 해결하는 인공지능기법이다. 이 기법은 다른 여러 인공지능기법들이 문제 해결을 위해 일반적인 지식을 추출하여 이를 이용하는 것과 달리 사례 하나하나를 지식의 단위로 하여 그 특수한 지식을 이용한다. 이러한 특징으로 인해 사례기반추론은 기존의 전통적인 추론 시스템과 달리 문제해결에 필요한 일정한 규칙을 찾기 힘들거나 한정된 규칙들로 문제를 표현하기 어려운 분야 그리고 지식의 일반화가 어려운 분야에 매우 효과적인 문제해결 방법론으로 알려져 있다. 실제로 사례기반추론은 최근 몇 년간 문제 해결과 학습의 도구로 많이 사용되고 있으며 다양한 분야에서 성공적인 사례기반 추론시스템이 개발되고 있다. 이처럼 다양한 분야에서 사례기반추론 모형의 활용이 증대되면서 그 성능 개선에 대한 요구 역시 점점 증가하고 있는데 사례기반추론 모형은 가장 유사한 사례를 얼마나 신속하게 추출하는가를 그 주요 성능 지표로 한다. 따라서 그 성능을 개선하기 위해서는 어떠한 방법을 통해 사례기반추론의 두 중요한 알고리즘 요소인 사례의 인덱싱(case indexing)과 사례의 추출(case retrieving)을 수행할 것인가를 결정하는 것이 매우 중요하다. 이 때 무엇보다 핵심적인 것은 과거의 사례가 주어진 새로운 사례와 얼마나 유사한가를 계산하는 사례 간의 유사도를 어떻게 측정할 것인가를 결정하는 것이다. 본 연구는 이처럼 사례기반추론 모형의 성능에 직접적으로 관련되어 있는 두 사례 간의 유사도 측정 문제에 관심을 가지고 특별히 그 사례의 특성들이 기호 데이터일 때 두 사례 간의 유사도 측정에 관해 연구한다. 이 때 이제까지 단순히 정확하게 일치하는 값을 가지는 특성의 개수를 집계한 후 그 크기를 유사도 값으로 사용하던 방식에서 벗어나 각 특성이 가지는 값들 사이에 적절한 유사도 값을 미리 정의해 놓은 유사도 매트릭스를 이용하여 좀더 효과적으로 유사도를 측정하고자 한다. 유사도 매트릭스를 이용한 유사도의 측정은 실제적으로 유사도를 측정하는 계산과정이 없는 것과 마찬가지였던 기존의 방식이 각 특성이 가지는 값들 사이의 거리가 같다는 의미가 되어 사례 간의 유사도에 관한 충분한 정보를 주지 못하고 있던 것에 반해 특성이 가지는 값 하나하나에 관한 좀더 구체적인 유사도 값을 할당함으로써 좀더 많은 정보의 제공을 가능하게 해준다. 또한 본 연구에서는 유사도 매트릭스에 보다 적절한 유사도 값들을 할당하기 위해 현재와 같이 사람의 인지에 의존해 유사도 매트릭스의 값들을 할당하지 않고 유전자 알고리즘이라는 인공지능기법을 이용하여 그 값들을 찾고자 한다. 이는 사람의 인지 능력의 한계 하에서 많은 유사도 매트릭스의 값들에 적절한 값들을 찾아내야 하는 부담을 탐색 알고리즘이면서 최적화 기법 중 하나인 유전자 알고리즘을 이용하여 극복하고자 하는 생각이다. 본 연구는 이를 소기업 신용평가를 위한 사례기반추론 모형을 대상으로 실험하였다. ;Case-based reasoning (CBR) is a problem solving methodology seen in artificial intelligence. CBR uses prior cases to find out suitable solutions for new problems. While other major artificial intelligence techniques rely on making associations along generalized relationships between problem descriptors and conclusions, CBR benefits from utilizing case specific knowledge of previously experienced, concrete problem situations. CBR shows significant promise for improving the effectiveness of complex and unstructured decision making. And the paradigm of CBR has been used effectively in domains where decision problems are open ended and where no clear cut methods are available to solve them. Actually CBR has been successfully applied to the areas of planning, diagnosis, law and decision making, among others. As CBR has been used with great success, the demand for higher perfomance of CBR grows larger. The success of CBR system mainly depends on effective and efficient retrieval of similar cases for a new problem. Indexing and matching are thus very important to CBR. Through the retrieval process, similar cases that are potentially useful to the current problem are retrieved from the case base. The computing of the degree of similarity between the input case and the target case can usually be calculated using various similarity functions. In this thesis, we discuss the hybrid modeling of CBR using genetic algorithm (GAs) to retrieve more relevant cases. Our particular interest lies in the symbolic retrieval functions. So far widely used symbolic retrieval function is an exact match counting method. However there is no real similarity calculating process. Our approach aims at calculating the similarity more effectively by using similarity matrix. The objective of the calculating similarity by using similarity matrix is to retrieve the most similar case to the new problem to be solved. For this objective, we need more concrete value of similarity between the input case and the target case. That is why we use the similarity matrix for calculating similarity. This study proposes the hybrid CBR models using genetic algorithms for an effective CBR system. The GA-CBR model uses genetic algorithms to find optimal or near optimal values of the similarity matrix. We apply these values to the matching and ranking procedure of CBR. We use genetic algorithms to extract knowledge that can guide effective retrieval of useful cases. Our proposed approach is demonstrated by applications to credit evaluation model.