DSpace at EWHA: 일반화된 연관규칙 발견을 위한 레벨 기반 데이터마이닝 시스템

Browse

My Repository

DSpace at EWHA일반대학원 컴퓨터공학과 Theses_Master

View : 538 Download: 0

일반화된 연관규칙 발견을 위한 레벨 기반 데이터마이닝 시스템

Title: 일반화된 연관규칙 발견을 위한 레벨 기반 데이터마이닝 시스템

Authors: 김온실

Issue Date: 2002

Department/Major: 과학기술대학원 컴퓨터학과

Publisher: 이화여자대학교 과학기술대학원

Degree: Master

Abstract: Association Analysis, which is a data mining technique, allows us to discover correlations or co-occurrences of various transactional events. Given a large database of transactions, where each transaction consists of a set of items, and taxonomy information on the items, we can find more generalized associations between items at the higher level of the taxonomy. Dealing with items in the higher level of the taxonomy has several advantages over one level approaches. First, it allows more support measure and makes it easier to find new rules. Second, it is easy to conceptualize the new rule in the higher concept. A prior approach to the problem of mining generalized association rules is to replace each transaction with an "extended transaction" that contains all the items as well as all the ancestors of each item in the original transaction. This approach requires exponential time of computation as the transaction size grows. In this paper, we propose a generalization method that replaces all items to their ancestors at the proper level instead of extending original transactions with all the ancestors of the items. The user can select the proper level within the taxonomy of the items. Also we design and implement data mining system that satisfies these conditions. We conduct experimental tests on the proposed system using sample data. The results show that by selecting specific items we could find more valuable rules with useful information.;대량의 데이터로부터 숨겨진 패턴을 추출하는 데이터마이닝 기법 중에서 연관규칙 탐사는 데이터베이스에서 단위 트랜잭션 당 동시에 발행할 확률이 높은 항목들의 유형을 발견하는 기법이다. 연관규칙 탐사 과정에서 개념계층 (taxonomy)을 사용하여 보다 포괄적인 의미를 갖는 규칙을 찾아낼 수 있는데 이를 일반화된 연관규칙이라 하고 이를 통해 이전에는 간과되었던 중요한 규칙을 발견할 수 있다.일반화된 연관규칙에 관한 기존의 접근방법은 후보항목집합의 각 항목에 대한 개념계층상의 모든 조상들을 트랜잭션에 추가한 후 확장된 트랜잭션에 대해 지지도를 계산하는 것인데 이 경우 연관규칙 기법의 단점중의 하나인 계산량 증가 문제가 더욱 두드러지게 된다. 본 논문에서는 모든 개념계층 레벨이 아닌, 사용자가 관심 있는 특정 레벨에 맞추어 연관규칙 탐사를 수행함으로써 규칙생성의 복잡도를 줄이는 방법을 제안하였다. 그러나 모든 항목을 한 레벨로 일반화하는 데는 무리가 따르기 때문에 관심 있는 항목의 경우 일반화 레벨을 따로 명시할 수 있도록 하여 사용자가 원하는 규칙을 발견하도록 하였다. 그리고 제안한 방법을 적용하여 마이닝 시스템을 설계 및 구현하였으며 일괄적인 일반화 수준을 변화시키고 특정 항목의 경우 일반화 수준을 별도로 선정하는 실험을 수행하였다. 실험 결과, 일반화 레벨이 높으면 높을수록 임계값을 만족시켜 더 많은 규칙이 발견되는 경향을 보이며, 특정 항목 선정 시 해당 항목이 나타나는 규칙이 발견되어 사용자가 원하는 형태의 규칙이 생성됨을 확인하였다. 따라서 본 논문에서 제안한 방법에 따라 계산량을 줄일 수 있는 일반화 작업을 수행함으로써 이전에는 나타나지 않던 포괄적인 의미를 갖는 규칙을 발견할 수 있으며 특정 항목의 경우 사용자가 별도의 일반화 레벨을 선정함으로써 사용자의 목적에 맞는 유용한 규칙이 생성될 수 있다는 결론을 얻을 수 있었다.