DSpace at EWHA: XML태그의 의미적 유사성 검사

Browse

My Repository

DSpace at EWHA과학기술대학원 컴퓨터학과 Theses_Master

View : 1006 Download: 0

XML태그의 의미적 유사성 검사

Title: XML태그의 의미적 유사성 검사

Other Titles: Semantic-based Similarity Checking between XML Tags

Authors: 李蕙受

Issue Date: 2000

Department/Major: 과학기술대학원 컴퓨터학과

Keywords: XML태그; 컴퓨터; 유사; 검사

Publisher: 이화여자대학교 과학기술대학원

Degree: Master

Abstract: The success of XML(eXtensib1e Markup Language) is primarily based on its flexibility : everybody can define the structure of XML documents that represent information in the form he or she desires. The biggest advantage of XML is at the same time its biggest handicap. XML is so flexible that XML documents cannot be automatically provided with an underlying semantics. Different tag sets, different names for elements or attributes, or different document structures in general aggravate the task of classifying and clustering XML documents precisely. In this thesis, we design and implement a system that allows checking the semantic-based similarity between XML tags. First, this system extracts the underlying semantics of tags and then expands the synonym set of tags using an WordNet thesaurus and user-defined word library which supports the abbreviation forms and compound words for XML tags. Seconds, considering the relative importance of XML tags in the XML documents, we extend a conventional vector space model which is the most generally used for document model in Information Retrieval field. Using this method, we have been able to check the similarity between XML documents which are represented different tags.;XML(extensible Markup Language)문서가 웹 문서의 표준으로 자리 매김 할 수 있는 가장 큰 성공요인은 사용자가 문서 타입을 기술할 수 있는 유연성(flexibility)이다. 그러나 XML의 가장 큰 장점인 유연성은 동시에 가장 큰 단점이다. XML의 유연성으로 야기되는 문제점은 동일한 의미를 표현하기 위해 XML문서 작성자마다 서로 다른 태그명과 구조를 사용한다는 점이다. 즉 서로 다른 태그 집합, 요소(element), 속성(attribute)에 대한 서로 다른 이름 또는 다른 문서 구조로 인해 다른 태그로 표현된 문서는 서로 다른 부류의 문서로 간주되기 쉽다. 따라서 본 논문은 XML태그에 내재된 의미 정보(semantic information)를 추출하여 시소러스인 WordNet과 사용자 정의 용어 사전을 기반으로 각 태그를 최대한 의미적으로 유사한 동의어로 확장하여 두 XML문서의 확장된 태그간의 의미적 유사도를 비교 분석하였다. 그리고 의미적 유사도를 가중치로 부여하여 기존의 비구조적인 문서를 표현하는 방법인 벡터 스페이스 모델(Vector Space Model)에 태그가 가지는 의미적 유사도를 고려하여 확장 적용하였다. 본 논문에서는 XML 태그의 의미적 유사도를 검사하는 시스템을 설계하고 구현하므로써 XML태그간의 의미적 유사도를 이용해서 두 XML태그간의 유사성을 검사한다. 그 결과 두 XML문서가 유사한지 아닌지를 휴리스틱하게 판단할 수 있듯이, 본 논문에서 제시한 방법을 통해 XML문서의 유사도를 파악할 수 있었다. 본 논문의 의의는 크게 두 가지로 다음과 같다. 첫째, 정보 검색분야에서 문서를 표현하는 방법으로 가장 널리 사용되는 벡터 스페이스 모델을 반구조적(semi-structured) 문서인 XML에 적용하여 태그가 지니는 의미정보를 충분히 반영하였다. 둘째, XML문서를 분류(classification)하거나 비슷한 부류의 문서로 군집(clustering)시키고자 하는 응용분야에서 본 논문에서 제시하는 의미 기반의 태그 비교 방법은 XML문서의 데이터 준비 단계(Data preprocessing)로 사용할 수 있을 것이다.