DSpace at EWHA: 점진적 갱신을 위한 MOLAP 청크 기반 저장 방법에 대한 연구

Browse

My Repository

DSpace at EWHA과학기술대학원 컴퓨터학과 Theses_Master

View : 727 Download: 0

점진적 갱신을 위한 MOLAP 청크 기반 저장 방법에 대한 연구

Title: 점진적 갱신을 위한 MOLAP 청크 기반 저장 방법에 대한 연구

Other Titles: (A) MOLAP Storage Scheme for Incremental Update

Authors: 김수현

Issue Date: 2002

Department/Major: 과학기술대학원 컴퓨터학과

Keywords: 점진적 갱신; MOLAP; 청크; 저장 방법

Publisher: 이화여자대학교 과학기술대학원

Degree: Master

Abstract: OLAP(Online Analytical Processing)은 데이터를 다차원적으로 분석하여 그 결과를 사용자에게 온라인으로 신속하게 제공하는 기술이다. OLAP 데이터의 저장 방식 중 하나인 MOLAP((multi-dimensional OLAP)은 데이터와 사전 연산 결과를 큐브(cube)라고 불리는 다차원 배열에 저장한다. OLAP 데이터는 주기적 삽입이 일어나게 되는데 데이터 저장 방식이 MOLAP 시스템인 경우 베이스 큐브와 사전 연산 결과를 모두 갱신해야 한다. MOLAP 시스템에서 데이터 삽입을 관리하기 위한 방법으로 큐브의 구조적인 변경을 위한 전체 처리 방법과 변경된 부분만을 다시 처리하는 점진적 갱신 방법을 제공한다. 이러한 방법은 주기적인 데이터 삽입이 일어날 때마다 기존의 베이스 큐브를 반복적으로 접근하고, 집계 연산 결과가 재계산되는 문제가 있다. 이에 걸리는 시간으로 인해 질의 처리에 대한 응답이 늦어지게 된다. 따라서 베이스 큐브에 대한 접근을 줄이고 집계 연산 결과를 빠르게 갱신할 수 있는 점진적 갱신이 가능한 저장 방법이 필요하다. 본 연구에서는 큐브의 점진적 갱신이 가능한 저장 방법으로 시간 차원 기준의 청크 구조와 시계열 순서를 이용한 MOLAP 청크 저장 방법을 제안한다. 시간 차원 기준 저장 방법은 실생활 데이터가 저장될 때 모든 차원에 대한 구조적인 변경이 일어나는 경우보다 시간 차원에 대한 새로운 어트리뷰트의 추가가 주기적으로 일어난다는 사실을 이용하였다. MOLAP에서는 연산을 효과적으로 처리하기 위해 다차원 배열을 작은 크기의 청크로 나누어 저장하는데 이 때 저장 순서를 시계열 순서로 저장하는 방법을 제시한다. 시계열 순서에 따라 청크 기반으로 데이터를 저장할 경우 희박 청크는 압축하여 저장하게 된다. 이 때 청크의 위치 정보를 유지하기 위해 인덱스를 사용한다. 기존에 데이터의 삽입을 고려한 UB-트리, 그리드 파일 인덱스는 ROLAP 시스템에서 많이 사용되어졌다. 이러한 인덱스를 MOLAP 큐브에 적용하였다. 시계열 순서로 저장함으로써 청크를 재생성 또는 재구성하지 않고 삽입된 큐브를 빠르게 적용할 수 있으며 베이스 큐브에 대한 접근을 없앨 수 있다. 그리고 요약 테이블의 데이터 갱신을 최소화 할 수 있다. 그리고 청크 기반으로 저장할 경우 희박 청크는 압축하여 저장하게 되는데 이 때 청크의 위치 정보의 유지를 위해 사용되는 인덱스 기법들을 MOLAP 큐브에 적용하여 비교하였다. 기존의 인덱스 기법을 큐브에 적용하고, 청크의 시계열 순서 저장함으로써 큐브에서 주기적인 삽입이 일어날 때 베이스 큐브의 접근을 최소화하고, 집계 연산 결과를 전체의 재계산이 아니라 삽입된 부분만 계산하여 기존의 결과에 빠르게 갱신할 수 있게 되었다. 그리고 성능 평가를 통해 점진적 갱신에 적합한 인덱스를 제안하였다.;OLAP is a process and methodology that analyzes and queries data stored in a data warehouse. MOLAP cube storage scheme stores data and pre-calculation results in a multidimensional array called by cube to speed. Data is usually added periodically to the data warehouse to include more recent informal ion about the organization' s business activities. So OLAP data must be updated after data warehouse data is changed to synchronize OLAP and data warehouse data. But MOLAP system must reload the cube' s data and recalculate the aggregations. There are two methods to manage changed data in MOLAP system. The first method is the full processing to apply the structure changes. The second method is the incremental update to add new data to the cube. These methods have problems that are to repeat access the base cube and to recalculate summary tables. So we need to reduce cost to apply changed data. This paper proposes a storage scheme for incremental updates in a chunk based MOLAP system using time order because most new data are added periodically. Our method solved that problem using time order. Chunk structure based tile order is defined the range of time dimension by user. After the length of time dimension in a chunk is decided, the length of the other dimension is decided to adjust a chunk to a disk block. Given queries for a time analysis by user, the proposed storage scheme has more valid data in a chunk than the existing method. In this paper, UB-tree and Grid File are applied to MOLAP data cube. When storing data along time order, cube do not re-generate or re-construct. When data is inserted, summary tables is updated incrementally. This data storage scheme is an effective method to manage data cube incrementally. This method is compared with row major order. In the test, the inserted cube is appended to the base cube without cube re-generation. However row major order must be generated as the inserted cube. UB-tree and Grid File index applied to chunk. We compared with the size of each index. We propose the most optimized Index for incremental update by the performance test.