DSpace at EWHA: 고속 질의 처리를 위한 MOLAP 큐브 저장 구조 연구

Browse

My Repository

DSpace at EWHA과학기술대학원 컴퓨터학과 Theses_Master

View : 608 Download: 0

고속 질의 처리를 위한 MOLAP 큐브 저장 구조 연구

Title: 고속 질의 처리를 위한 MOLAP 큐브 저장 구조 연구

Authors: 임윤선

Issue Date: 2003

Department/Major: 과학기술대학원 컴퓨터학과

Publisher: 이화여자대학교 과학기술대학원

Degree: Master

Abstract: MOLAP is a technology that accelerates multidimensional data analysis by storing data in a multidimensional array and accessing them using their position information. Depending on a mapping scheme of a multidimensional array onto disk, the speed of MOLAP operations such as slice and dice varies significantly. Zhao et al. proposed a MOLAP cube storage scheme that divides a cube into small chunks with equal side length, compresses sparse chunks, and stores the chunks in row-major order of their chunk indexes. Due to data compression for sparse chunks, dense chunks are not aligned to disk block boundaries and the relative position of chunks cannot be obtained in constant time without using indexes. In this thesis, we developed a variant of their cube storage scheme by placing chunks in a different order. We also proposed a bitmap index that can be used for chunk-based MOLAP cube storages where chunk order can be computed efficiently. Our cube storage accelerates slice and dice operations by aligning chunks to physical disk block boundaries and clustering neighboring chunks. Z-indexing is used for chunk clustering. The efficiency of the proposed scheme is evaluated through experiments. We showed that the proposed scheme is efficient for 3~5 dimensional cubes that are frequently used for business data analysis. Our bitmap index can be constructed along with the corresponding cube generation. The relative position of chunks is retained in the index so that chunk retrieval can be done in constant time. We placed in an index block as many chunks as possible so that the number of index searches is minimized for OLAP operations such as range queries. We showed the proposed index is efficient by comparing it with multidimensional indexes such as the UB-tree and the grid file in terms of time and space. ; MOLAP(multi-dimensional online analytical processing)은 데이터의 다차원적 분석 기술로서, 이는 질의 처리 속도를 높이기 위해 데이터를 큐브(cube)라고 불리는 다차원 배열에 저장하고 배열 인덱스를 사용하여 데이터를 엑세스한다. 큐브는 다양한 방식으로 디스크에 저장될 수 있으며 이 때 사용되는 방식에 따라 MOLAP의 주요 연산인 슬라이스와 다이스 연산 속도가 크게 영향을 받는다. 이러한 연산들을 효율적으로 처리하기 위해 다차원 배열을 작은 크기의 청크로 나누고, 이 들 중에서 희박한 청크들을 압축하여 저장하는 기법이 Zhao et al.에 의해 제안되었다[1]. 이 방식은 저장 공간을 효율적으로 사용할 뿐만 아니라 각 차원에 공평한 질의 처리 속도를 보장한다. 그러나 실생활의 데이터는 희박하고 클러스터링되어 있어 청크 내의 데이터 밀도는 균일하지 않게 된다. 따라서 청크 내의 데이터 밀도에 따라 압축 저장하게 되면서 청크들의 크기가 다양해지게 된다. 이와 같이 OLAP 연산 속도를 높이기 위해 디스크 블록 크기에 맞게 설계된 청크들의 크기가 다양해지면서 효율적으로 디스크를 엑세스할 수 있는 새로운 저장 방식이 필요하다. 또한 청크 크기가 균일하지 않고 데이터가 없는 청크는 저장되지 않게 되면서 청크 위치와 크기 정보가 손실되어 청크들을 신속하게 엑세스하기 위한 인덱스가 필요하다. 본 연구에서는 청크들을 밀도와 인접도 기준으로 배치시켜 저장하고, 비트맵을 사용하여 청크 기반 MOLAP 큐브를 인덱싱함으로써 OLAP의 주요 연산인 슬라이스와 다이스 연산 속도를 향상시키는 방법을 제시한다. 본 연구에서 제안한 저장 구조는 청크 밀도를 이용하여 청크들을 디스크 블록 경계에 가능한 한 맞추었고, Z-인덱싱을 사용하여 인접한 저밀도 청크들을 군집화함으로써 디스크 I/O의 속도를 높였다. 제안한 큐브 저장 방식은 일반적 비즈니스 데이터의 분석에 흔히 사용되는 3~5차원의 큐브 저장에 효율적이라는 것을 실험적으로 보였다. 또한 본 연구에서 제안한 청크 기반 MOLAP 큐브를 위한 비트맵 인덱스는 큐브가 생성될 때 동시에 생성될 수 있으며, 인덱스 수준에서 청들의 상대 위치 정보를 보존하여 청크들을 상수 시간에 검색할 수 있도록 하였고, 인덱스 블록마다 가능한 많은 청크들의 위치 정보가 포함되도록 하여 OLAP 주요 연산 처리 시에 인덱스 엑세스 회수를 크게 감소시켰다. 인덱스의 시간 공간적 효율성은 다차원 인덱싱 기법인 UB-트리, 그리드 파일과의 비교를 통해 검증하였다.