DSpace at EWHA: Model based clustering을 토대로 한 군집분석의 활용

Browse

My Repository

DSpace at EWHA일반대학원 통계학과 Theses_Master

View : 712 Download: 0

Full metadata record

DC Field	Value	Language
dc.contributor.author	엄동란	-
dc.creator	엄동란	-
dc.date.accessioned	2016-08-25T02:08:53Z	-
dc.date.available	2016-08-25T02:08:53Z	-
dc.date.issued	2007	-
dc.identifier.other	OAK-000000028099	-
dc.identifier.uri	https://dspace.ewha.ac.kr/handle/2015.oak/175027	-
dc.identifier.uri	http://dcollection.ewha.ac.kr/jsp/common/DcLoOrgPer.jsp?sItemId=000000028099	-
dc.description.abstract	Modern society offers an immense amount of information. However, no matter how much information one has, it turns out to be useless if it is not useful to one. It is the responsibility of each individual to search data and observe in order to obtain useful information and to find the knowledge we wanted. For modern society that is experiencing rapid changes, it is necessary to identify the characteristics of each group of data with multi-dimensional perspectives rather than thinking of many data as visible one-dimensional materials in order to increase the efficiency of utilizing data information. It is not easy to find characteristics among most of the immense materials. It is also difficult in the situation when one has no information. However, if using multivariate statistical analysis in order to handle this situation efficiently, it is possible without any relation to find information about the data and group the data with common characteristics. The analytical methods developed for conducting such analysis include principal component analysis, factor analysis, canonical correlation analysis, discriminant analysis, and cluster analysis. This paper examines cluster analysis among the multivariate statistical analysis methods that is easy to calculate and interpret and also attempts to utilize the information through actual data. First, the distance matrix is calculated from the correlation coefficient in order to utilize the concept of distance through the given data. MDSCALE (multi-dimensional scaling) is utilized in order to reduce the high-dimensional data to the low dimensional data with the calculated distance matrix. With this reduced data, the information is expressed visibly, and in order to categorize it into appropriate clusters, MCLUST was suggested and is to be applied in the actual data.;현대의 사회는 방대한 정보들을 제공하고 있다. 그러나 아무리 정보가 많다고 할지라도 각자에게 유용한 정보가 아니라면 무용지물이다. 유용한 정보를 찾아내기 위해 데이터를 탐색하고 관찰하여 우리가 얻고자 하는 지식을 찾아내는 것은 각자 개인의 몫이며, 많은 정보들을 눈에 보이는 1차원적으로 자료로 생각하는 것보다 관련성 있는 그룹으로 구분지어 다차원적인 각도로 그룹의 특성을 파악하여 데이터 정보의 활용도를 높이는 것이 빠른 변화를 겪고 있는 현대사회에 필요한 항목이라 생각한다. 대부분의 방대한 자료들 사이에서 특성을 파악하기란 쉬운 일이 아니다. 더욱이 어떠한 정보도 갖고 있지 않은 상황에서도 마찬가지이다. 그러나 이러한 상황을 잘 해결하기 위해 다변량분석(multivariate statistical analysis)을 활용한다면 관련 없이 자료에 대한 정보를 얻을 수 있으며, 공통의 성질로 묶을 수 있다. 이러한 분석을 하기 위해 개발된 분석방법들은 주성분분석, 인자분석, 정준상관분석, 판별분석, 군집분석 등이 있다. 본 논문에서는 계산이 용이하며, 해석도 쉬운 다변량분석 중 군집분석에 대해 살펴보고 실제 데이터를 통해 정보를 활용할 수 있도록 하고자 한다. 먼저 주어진 데이터를 통해 거리의 개념을 활용하기 위해 상관계수를 통해 거리행렬을 구한다. 구한 거리행렬을 통해 고차원의 자료를 저차원의 자료로 축소하는 다차원척도법 MDSCALE(multi-dimensional scaling)을 활용한다. 이렇게 축소한 데이터를 통해 가시적으로 표현하며, 적당한 군집으로 나누어 보기 위해 군집분석(MCLUST)을 제안하여 실제 자료에 적용하여 보고자 한다. 현대의 사회는 방대한 정보들을 제공하고 있다. 그러나 아무리 정보가 많다고 할지라도 각자에게 유용한 정보가 아니라면 무용지물이다. 유용한 정보를 찾아내기 위해 데이터를 탐색하고 관찰하여 우리가 얻고자 하는 지식을 찾아내는 것은 각자 개인의 몫이며, 많은 정보들을 눈에 보이는 1차원적으로 자료로 생각하는 것보다 관련성 있는 그룹으로 구분지어 다차원적인 각도로 그룹의 특성을 파악하여 데이터 정보의 활용도를 높이는 것이 빠른 변화를 겪고 있는 현대사회에 필요한 항목이라 생각한다. 대부분의 방대한 자료들 사이에서 특성을 파악하기란 쉬운 일이 아니다. 더욱이 어떠한 정보도 갖고 있지 않은 상황에서도 마찬가지이다. 그러나 이러한 상황을 잘 해결하기 위해 다변량분석(multivariate statistical analysis)을 활용한다면 관련 없이 자료에 대한 정보를 얻을 수 있으며, 공통의 성질로 묶을 수 있다. 이러한 분석을 하기 위해 개발된 분석방법들은 주성분분석, 인자분석, 정준상관분석, 판별분석, 군집분석 등이 있다. 본 논문에서는 계산이 용이하며, 해석도 쉬운 다변량분석 중 군집분석에 대해 살펴보고 실제 데이터를 통해 정보를 활용할 수 있도록 하고자 한다. 먼저 주어진 데이터를 통해 거리의 개념을 활용하기 위해 상관계수를 통해 거리행렬을 구한다. 구한 거리행렬을 통해 고차원의 자료를 저차원의 자료로 축소하는 다차원척도법 MDSCALE(multi-dimensional scaling)을 활용한다. 이렇게 축소한 데이터를 통해 가시적으로 표현하며, 적당한 군집으로 나누어 보기 위해 군집분석(MCLUST)을 제안하여 실제 자료에 적용하여 보고자 한다.	-
dc.description.tableofcontents	Ⅰ. 서론 = 1 Ⅱ. Model-Based Clustering(모형 기반 계층군집) = 2 A. 군집분석의 개요 = 2 B. 계층적(Hierarchical)군집과 비계층적(Nonhierarchical) 군집방법 = 4 Ⅲ. MCLUST = 6 Ⅳ. 전염병 자료의 군집화 = 8 A. 자료설명 = 8 B. 분석결과 = 12 C. 결과정리 = 22 Ⅴ. 인구이동 자료의 군집화 = 23 A. 자료설명 = 23 B. 분석결과 = 26 C. 결과정리 = 30 Ⅵ. 전체적인 요약과 결론 = 31 참고문헌 = 32 ABSTRACT = 34	-
dc.format	application/pdf	-
dc.format.extent	745721 bytes	-
dc.language	kor	-
dc.publisher	이화여자대학교 대학원	-
dc.subject.ddc	310	-
dc.title	Model based clustering을 토대로 한 군집분석의 활용	-
dc.type	Master's Thesis	-
dc.creator.othername	Uhm, Tong Ran	-
dc.format.page	ⅶ, 34 p.	-
dc.identifier.thesisdegree	Master	-
dc.identifier.major	대학원 통계학과	-
dc.date.awarded	2007. 8	-