DSpace at EWHA: 대용량 데이터를 위한 효율적인 ROLAP 큐브 생성 방법

Browse

My Repository

DSpace at EWHA일반대학원 컴퓨터공학과 Theses_Master

View : 598 Download: 0

대용량 데이터를 위한 효율적인 ROLAP 큐브 생성 방법

Title: 대용량 데이터를 위한 효율적인 ROLAP 큐브 생성 방법

Authors: 송지숙

Issue Date: 2002

Department/Major: 과학기술대학원 컴퓨터학과

Publisher: 이화여자대학교 과학기술대학원

Degree: Master

Abstract: ROLAP(Relational Online Analytical Processing)은 테이블 형태의 릴레이션에 저장되어 있는 데이터를 다차원적으로 분석하여 그 결과를 온라인으로 사용자에게 제공하는 제반 기술이다. 온라인 질의처리를 위해서 대부분의 ROLAP 시스템들은 분석 결과의 일부인 집계 테이블들을 미리 계산하여 둔다. 이 과정에서 기존의 방법들은 테이블들을 여러 차례에 걸쳐 정렬하게 되는데, 대형 테이블을 취급하는 경우 값비싼 외부 정렬을 피하기 힘들고 이는 집계 테이블 생성의 성능을 저하시키는 큰 요인이다. 정렬로 인한 오버헤드를 줄이는 한 방법은 테이블에서 직접 집계 테이블을 생성하지 않고 중간 과정으로 배열에 데이터를 저장하여 사전 연산을 하고 이를 테이블로 변환하여 집계 테이블 생성하는 것이다. 이 방법의 효율성은 정렬을 기반으로 하는 ROLAP 알고리즘에 비해 훨씬 성능이 뛰어나다는 것이 기존 연구에 의해 이미 입증되었다. 본 연구에서 제안한 알고리즘은 배열을 이용하는 방법을 사용하여 정렬 오버헤드를 없애고 데이터를 배열의 위치정보를 통해 직접 엑세스함으로써 ROLAP 집계 테이블 생성을 고속화하였다. 이렇게 배열을 이용하여 큐브를 생성할 때 집계 연산을 하는 동안 무효(invalid) 셀까지 메모리에 유지해야 하기 때문에 메모리 낭비가 심하다. 특히 희박한 데이터의 경우 메모리 낭비가 더욱 심해지는데, 대형의 희박한 데이터를 취급할 경우 큐브 생성을 위해 필요한 메모리가 실제 사용할 수 있는 메모리보다 더욱 커져 메모리가 부족한 경우가 발생한다. 본 연구에서 제안하는 알고리즘은 배열을 이용하면서도 메모리를 효율적으로 재활용하여 대형의 희박한 데이터를 효과적으로 처리할 수 있도록 하였다. 본 연구의 효율성은 이론적인 분석과 실험을 통하여 입증하였다.;ROLAP(Relational Online Analytical Processing) is a process and methodology to provide users with multidimensional analysis results of data stored in relations. For fast query processing, most ROLAP systems pre-compute such results, called summary tables. This process mostly involves intensive table sorting stages, which incur the main cost of the pre-computation process. One way to reduce the sorting overhead is to generate summary tables indirectly from data stored in not relations but arrays. This method was already proved that it is much more efficient than sort-based ROLAP algorithms by established research. The algorithm proposed in this paper removes the sorting overhead using an array and generates efficiently ROLAP summary table by accessing data directly through positional information of an array. To generate cube using an array wastes memory extremely because invalid cells are maintained in memory. When generating ROLAP cube for large and sparse data, required memory for it may be larger than available memory. The algorithm proposed in this paper uses an array and reuses memory efficiently, which can be handled large and sparse data. We showed the efficiency of new algorithm through theoretical analysis and experiments.