DSpace at EWHA: XML 문서의 XPath 질의 처리를 위한 적응적인 색인 기법

Browse

My Repository

DSpace at EWHA과학기술대학원 컴퓨터학과 Theses_Master

View : 1392 Download: 0

XML 문서의 XPath 질의 처리를 위한 적응적인 색인 기법

Title: XML 문서의 XPath 질의 처리를 위한 적응적인 색인 기법

Authors: 정현숙

Issue Date: 2004

Department/Major: 과학기술대학원 컴퓨터학과

Publisher: 이화여자대학교 과학기술대학원

Degree: Master

Abstract: XML 문서가 가지고 있는 태그의 자유로운 정의와 내포된 구조 정보는 정보 검색 및 문서 관리 분야에 많은 이점을 제공할 수 있다. 따라서 XML 데이터를 위한 저장 기법, 인덱싱 기법, 질의어 설계 및 처리, 그리고 질의 최적화에 이르기까지 많은 연구가 활발히 진행되고 있다. 그 중에서도 속도면에서 우수한 성능을 기대할 수 있는 인덱싱 기법에 관한 연구가 많이 진행되어 왔다. 본 논문에서는 XML 문서를 인덱싱하여 단순 경로뿐만 아니라 상대 경로를 포함하는 질의도 포함하는 적응적인 방법을 제안하고자 한다. 또한 데이터 웨어하우스에 저장된 무한히 많은 XML 문서를 대상으로 인덱싱할 수 있도록 하는 방법을 제안하고자 한다. 본 논문의 접근 방법은 크게 XML의 구조적 정보를 RDB에 저장하는 과정, 인덱스를 생성하는 과정, 인덱스로부터 질의를 처리하는 과정으로 나누어지며 구체적인 수행 내용은 다음과 같다. 첫째, XML 문서로부터 구조적 정보를 추출하여 RDBMS에 저장한다. 데이터 웨어하우스에 문서들이 대용량으로 축적되는 구조이며 데이터 웨어하우스의 특성상 문서들이 한 번 저장되면 업데이트가 거의 일어나지 않고 계속 쌓이게 된다. 둘째, Trie를 기반으로 하여 인덱스를 생성한다. 생성된 인덱스를 저장할 때 데이터 웨어하우스의 문서를 대상으로 한다는 것을 고려하여 수많은 문서의 엘리먼트를 처리하여 Trie의 각 노드별로 인덱스 테이블을 생성하여 저장하는 방법을 제안한다. 셋째, 생성된 인덱스에 노드 넘버링 기법을 적용하여 단순 경로뿐만 아니라 상대 경로를 포함한 질의도 처리한다. 넷째, 인덱스를 통해 다양한 형태의 XPath 질의를 처리한다.;The emergence of the Web has increased interests in XML data. XML query languages such as XQuery and XPath use label paths to traverse the irregularly structured data. Without a structural summary and efficient indexes, query processing can be quite inefficient due to an exhaustive traversal on XML data. To overcome the inefficiency, several path indexes have been proposed in the research community. Traditional indexes generally record all label paths from the root element in XML data. Such path indexes may result in performance degradation due to large sizes and exhaustive navigations for partial matching path queries which start with the self-or-descendent axis("//"). Data Warehouse s can store a large amount of XML documents. As many path indexes are possible, it is difficult to store them due to the large sizes. In this paper, I propose an adaptive path index for XML data. First, in case of the simple path, the Trie as is used the index structure. And to overcome the large size problem of Data Warehouses, I propose the naming table separate from index tables ex. /A/B/C. to access the index tables very efficiently. Second, in the case of relative paths, the node numbering scheme is used to know the ancestor-descendent relationships. This way we can navigate from any node in the index. Therefore, it can support many queries containing relative paths such as //A//B, /A/B, //A//B//C//D, /A/B/C//D, //A//B[X=v1].