DSpace at EWHA: 결정 트리 알고리즘을 이용한 데이터 분류 및 예측

Browse

My Repository

DSpace at EWHA일반대학원 컴퓨터공학과 Theses_Master

View : 738 Download: 0

결정 트리 알고리즘을 이용한 데이터 분류 및 예측

Title: 결정 트리 알고리즘을 이용한 데이터 분류 및 예측

Authors: 윤혜성

Issue Date: 2001

Department/Major: 과학기술대학원 컴퓨터학과

Publisher: 이화여자대학교 과학기술대학원

Degree: Master

Abstract: 데이터 마이닝 기법은 통상적으로 데이터베이스에 내재해 있는 새로운 지식을 획득하기 위한 전략적 방법론이다. 데이터 마이닝 과정은 자동적이어야 하며 발견되는 패턴은 이득을 가져올 수 있도록 즉, 경제적인 이득을 가져올 수 있는 의미 있는 것이어야 한다. 데이터 마이닝에서 중요한 부분은 데이터 집합의 분류이다. 그러나 현존하는 분류 알고리즘은 스케일이 크지 않다는 문제점을 가지고 있다. 알고리즘의 대부분은 훈련 데이터가 메모리 안에서만 적절하게끔 제안되어 있다. 하지만 데이터 마이닝을 응용할 때 가장 중요한 것이 기술 활용의 범위이고 많은 다른 각도에서 접근할 수 있어야만 더 유용한 결론을 얻을 수 있다. 즉, 데이터 마이닝에서 이러한 기술들은 매우 큰 데이터베이스에서도 적용되어야만 한다는 것이다. 많은 분류 모델들이 문헌에서 제안되었지만, 그 중에서 분류 알고리즘으로서 결정 트리가 다음의 3가지 이유에서 특히 집중적으로 다루어진다. 첫 번째 이유는 결정 트리의 표현이 사람들이 이해하기 쉬운 분류 모델을 제공한다는 것이고, 두 번째로는 다른 모형들과 비교했을 때 상대적으로 모형구축에 소요되는 시간이 짧다는 것이다. 그리고 마지막으로 결정 트리의 정확성은 다른 모형보다 우수하거나 상대적으로 뒤지지 않는다는 것이다. 본 논문에서는 수많은 정보를 가지고 숨어있는 유용한 패턴이나 모형을 추출하는 데이터 마이닝의 의미를 파악하여 데이터 마이닝의 중요성을 인식하고자 한다. 따라서 우리 주변에서 살펴볼 수 있는 데이터 마이닝의 여러 분야 중에서 금융, 유통, 의료의 3가지 분야에 대한 자료를 분석함으로써 데이터 마이닝이 우리와 얼마나 밀접하게 관련이 되어 있는지를 설명하고자 한다. 논문에서 분석 과정은 다음과 같은 순서로 한다. 우선 데이터 마이닝 분석을 하기 위한 준비작업을 설명하고, 데이터 마이닝 도구를 적용하여 정보들을 분류한다. 그리고 분류를 하는 방법으로써 데이터 마이닝 분석과정을 쉽게 이해하고 설명할 수 있는 결정 트리 알고리즘을 결합한다. 마지막으로 적절한 결론을 제시하고 핵심적인 의사결정을 할 수 있는 데이터 마이닝의 활용분야에 대한 새로운 방안을 제시한다.;Data mining techniques have been adopted strategically as one of tools for enhancing the power of knowledge acquisition from the database which has not been used in terms of strategic usage. The process must be automatic. And the patterns discovered must be meaningful that they lead to some advantage, usually an economic advantage. Classification of large dataset is an important data mining problem. The existing classification algorithms have the problem that do not scale. Most of the current algorithms have the restriction that the training data should fit in memory. It is important, in most data mining applications, to have a range of techniques available so the data can be attacked from many different angles, and thus have more chance of getting useful results. Namely, in data mining applications these techniques must be applied to data held in very large databases. Many classification models have been proposed in the literature. Decision trees are especially attractive for a data mining environment for the following three reasons. First, due to their intuitive representation, they are easy to assimilate by humans. Second, they can be constructed relatively fast compared to other methods. Last, the accuracy of decision tree classifiers is comparable or superior to other models. In this thesis, we restrict attention to decision tree classifiers. This thesis explain importance of data mining. Data mining is aim to find potentially useful and non-trivial information from databases in the form of patterns. Therefore my intention is to write a thesis that would explain data mining in the context of its most common application in our suburbs-finance credit, sales marketing and medical treatment. The four stages of analysis process are the following. 1. Identify the preprocessing work for data mining. 2. Classify the information with data mining tool. 3. Act on decision tree algorithm of data mining techniques. 4. Propose a new analysis and result prediction.