DSpace at EWHA: 다중계층구조를 갖는 데이터의 구조적 분류 및 분석

Browse

My Repository

DSpace at EWHA일반대학원 컴퓨터공학과 Theses_Master

View : 841 Download: 0

다중계층구조를 갖는 데이터의 구조적 분류 및 분석

Title: 다중계층구조를 갖는 데이터의 구조적 분류 및 분석

Authors: 박영선

Issue Date: 2003

Department/Major: 과학기술대학원 컴퓨터학과

Publisher: 이화여자대학교 과학기술대학원

Degree: Master

Abstract: OLAP(On-Line Analytical Processing)은 데이터의 분석과 관리의 목적을 위해서 다차원 데이터를 모으고, 관리하고, 프로세싱하고, 표현하기 위한 것으로 사용자가 직접 데이터베이스에 접근(Access)하여 질의를 작성하고 데이터를 가공(Reporting&Analyze)하는 것을 말한다. 사용자에게 요구된 데이터를 다차원적으로 편리하고 신속·정확하게 전달해주기 위해 분석결과의 일부분을 미리 계산하여 저장해 두게 된다. 그러나 이는 데이터에 따라 미리 계산해 두는 데이터의 양이 기존 데이터의 수 천 배가되는 데이터 폭발 현상이 발생하기도 한다[2]. 이렇게 데이터가 급격히 증가하는 현상은 여러 이유에 따라 발생하지만, 각 차원안에 여러 계층구조가 있을 경우 더 크게 발생할 수 있다. 그러나, OLAP에서 사용자의 질의는 각 차원안에 여러 계층 구조를 두고 있어야만 결과를 얻을 수 있는 경우가 많고, 롤업/드릴다운과 같은 OLAP 연산을 하는 경우에 정확한 연산 결과와 브라우징(browsing)을 가능하게 한다[3]. 실제 응용에서 각 차원의 계층구조는 조화되지 않은 구조이거나 레벨들을 공유하는 구조로써 복잡하고 체계화되지 않은 연결구조를 가지는 경우가 많기 때문에 이를 서버에서 지원해 주지 않으면 시스템의 성능을 저하시킬 뿐 아니라 사용자가 원하지 않는 결과를 얻게 된다. 그러므로 각 차원안에 여러 계층 구조를 가지는 다중계층구조의 분류작업을 통하여 사용자가 가진 데이터를 예상하여 시스템이 이를 지원할 수 있도록 하고, 다중계층구조를 가지는 차원의 집계 테이블과 데이터 큐브 크기의 식을 정립하여 다중계층구조의 구조적인 모델들을 제시하여 정리하였다. 이는 전체적인 다중계층구조의 분류와 분석을 하는 연구로써 현재까지는 없었던 새로운 시도로 중요한 연구 부분이다. 본 연구에서는 OLAP의 데이터가 다중계층구조를 가질 때 이것의 구조적 분류를 통해 시스템에서 적절한 방법으로 이를 지원할 수 있도록 하며, 집계 테이블과 데이터 큐브 크기를 구하는 새로운 식을 정립하여 다중계층구조에 따른 데이터 증가를 분석하기 위한 구조적 모델들을 제시한다. 그럼으로써 사용자에게 데이터 증가를 예상하고 사용자가 사용할 환경에 적절한 사전연산을 가능하도록 구성하였다. ; OLAP is a technique for extracting, managing, and processing multidimensional data for data analysis to support decision making. To speed up multidimensional data analysis, database systems frequently pre-compute aggregates on some subsets of dimensions and their corresponding hierarchies. This improves query response time. However, this method causes data explosion because of multiple hierarchies. However, OLAP queries follow the drill/roll/slice/dice-paradigm, and therefore exhibits navigation patterns that follow the hierarchy of a dimension. In real-world applications, hierarchies are often unbalanced and shared levels, resulting in complex hierarchy structures. So OLAP systems need to support multiple hierarchies to operate exact results and browse the data. In this paper, we classified structural hierarchies as the first step to analyze multiple hierarchies. As a second step, for each dimension with multiple hierarchies, we laid down new formulas to find out the number of summary tables and the size of the data cube. As the final step we give theorems of structural models with multiple hierarchies to analyze data addition according to it. Users are able to estimate data increase due to multiple hierarchies and pre-compute aggregates appropriate to their environment.