DSpace at EWHA: Simulated Data와 Real Data를 이용한 Clustering 기법 비교

Browse

My Repository

DSpace at EWHA일반대학원 통계학과 Theses_Master

View : 835 Download: 0

Simulated Data와 Real Data를 이용한 Clustering 기법 비교

Title: Simulated Data와 Real Data를 이용한 Clustering 기법 비교

Authors: 김은미

Issue Date: 2009

Department/Major: 대학원 통계학과

Publisher: 이화여자대학교 대학원

Degree: Master

Advisors: 송종우

Abstract: As the amount of data increases by geometric progression, it has become very important to get useful information from the data. There are many analysis approaches available for such purpose. These days, particular attention is being given to clustering among those approaches. The logic of this approach is to group similar results from given observations into several categories and characterize each category so as to understand the overall structure of data. This paper described the characteristics of the most commonly used clustering techniques: k-means, PAM, and hierarchical clustering. In addition, real and simulated data were used to evaluate the performance of the three clustering methods. The evaluation was conducted using the adjusted Rand index as a measure of consistency. ;자료의 수가 기하급수적으로 늘어나면서 그 자료로부터 유용한 정보를 얻어내는 것은 매우 중요한 일이 되었다. 유용한 정보를 얻어내는 여러 가지 분석 중에서 요즘 특히 주목 받는 것이 clustering 방법이다. 이 방법은 주어진 관찰치 중에서 유사한 것들을 몇몇의 집단으로 그룹화하고, 각 집단의 성격을 파악함으로써 데이터 전체의 구조에 대한 이해를 돕는 데에 의의가 있다. 본 논문에서는 많은 clustering 기법 중에서 대표적으로 많이 사용하는 k-means, PAM, Hierarchical clustering의 특징을 설명하였다. 또한 Simulated data와 Real data를 이용하여 k-means, PAM, Hierarchical clustering의 Performance를 평가해보았다. 평가 기준으로는 일치성(Consistency)의 척도인 Adjusted Rand Index를 이용하였다.