DSpace at EWHA: 용어관계의 분류 모형 개발에 관한 연구

Browse

My Repository

DSpace at EWHA일반대학원 문헌정보학과 Theses_Ph.D

View : 712 Download: 0

용어관계의 분류 모형 개발에 관한 연구

Title: 용어관계의 분류 모형 개발에 관한 연구

Other Titles: A Study on the Development of a Classification Model for Terminological Relationships

Authors: 백지원

Issue Date: 2005

Department/Major: 대학원 문헌정보학과

Publisher: 이화여자대학교 대학원

Degree: Doctor

Advisors: 정연경

Abstract: The purpose of this study is to present the limitation of terminological relationship in the current information environment and to propose a solution 세 result the rich and refined terminological resources. This study is mainly composed of three parts: one is a detailed analysis with an extensive literature review to make an inventory of semantic relationship. The second part is develop a methodology for classification of terminological relationships and a classification model. The last part is to made suggestions. First of all, a full list of the existing V relationships were made. As a result of the comprehensive literature review, total of 727 semantic relationships were collected from the 7 types of 29 knowledge organization system(KOS) and they were analyzed comparatively. A number of relationships have been created based on the individual purposes and criteria in general. This situation not only causes difficulties in understanding and using of semantic relationships but also makes harder the semantic interoperability between the same kind of KOS as different KOS. Based on the previous analysis of the study, a classification model was devised All of the relationships were categorized into the three basic thesaural relationships: equivalent, hierarchial, and associative relationships. Most relationships were able to be categorized into the three thesaural semantic categories but there were a lot of synonymous relationships existed in several different relation types. The result revealed that all of the three relationships, mast of all, the associative relationships need to be classified in a more refined schemes. Secondly, to build a classification model for terminological relationships, two reverse approaches, the top-down approach involved the analysis based on the three main relationships The complementary approach involved the individual relationship cases through the extensive examination of existing semantic relationships. Approaches should be applied differently to each semantic relationship type. The principles of designing a classification model are as follows. · For the equivalent relationship, the dimension of variation of semantic relation is an important factor deciding the type of relationships. So equivalent relationships need to be sub-categorized based on the semantic dimension. · For the hierarchical relationship, each pair of term need to be sub-categorized by inserting the node labels based on the individual terms pairs. · For the associative relationship, facet analysis have been widely suggested as the solution for the sub-categorization. However, there are no verified methodology which is reliable to follow. So, through a series of tests, this study intended to verify both the possibility and limitation of the facet analysis to the associative relationships. The tests consisted of three different terminology groups and four different thesaurus. Two types of facet indicator sets were constructed which are different in their specificity to a domain. The result of the test showed that the more specifically designed facet indicators should be used for more domain-specific terminological resources, such as a narrow natural science area All in all, facet analysis to the associative classification may be possible if the facet indicators are well defined and elaborately applied. Several suggestions were made and facet indicator model was suggested for the associative relationship. Additionally, the way of categorization by the level of RT importance and topical domains were suggested. The key features of the suggested classification model are follows: · The classification model is multi-level in terms of categorization of relationships. The top level is the Ranganathan's PMEST, the most basic or fundamental categories of facets that generally apply to all kinds of domains. The second level is composed with the selective properties which is more detail and descriptive than the basic facets. The further broken down into sub-categories can be made for the purpose of creating useful classification scheme in accordance with the specific purpose. · The devised classification models have the potential to be used as a relationships registry which can be used in many domains and with many proposes. · The model can be a foundation of designing korean terminolgical resources corresponding to the multilingual language resources. A further study should be followed in several areas: · Future works should be done to clearly define each terms, terminological relationships, and facet indicators. · The issue of intellectual versus automatic term correlation is also an important topic to be discussed. · There is much more need to the adequate interface design in the environment of specifically classified terminological relationships.;본 연구는 모든 지식조직체계의 근간인 용어관계가 명확한 지침 없이 오랜 동안의 관행에 따라 사용됨으로써 오늘날의 정보환경에 있어서 부적절함을 밝히고, 그 해결 방안으로 용어관계의 분류 모형을 제시하고자 하였다. 이를 위해 기존의 여러 지식조직체계에 나타나는 각종 용어관계의 사례와 용어관계에 대한 이론적 연구들을 광범위하게 수집하여 다양한 용어관계 유형을 파악하였다. 그리고 용어관계의 본질을 재조명하고, 기존의 용어관계를 보다 명확하게 정의하고 범주화할 수 있는 용어관계의 분류 모형을 개발하고자 했다. 더 나아가 이렇게 개발된 모형을 정보검색을 비롯한 다양한 방면에 활용할 수 있는 방안을 모색하고자 하였다. 본 연구는 다음과 같이 구성되었다. 첫째, 용어관계의 본질과 전체적인 양상을 파악하였다. 이를 위하여 먼저 용어관계의 정의와 특성을 밝히고, 용어관계의 논의가 주로 진행되는 언어학, 컴퓨터공학, 문헌정보학의 입장에서 용어관계의 연구 양상과 특색을 밝힘과 동시에 이들의 상호 관계를 분석하였다. 또한 용어관계가 기반이 되는 주요 지식조직체계의 유형에서 용어관계가 사용되는 방법을 분석하고 상호 비교한 결과, 거의 모든 용어관계가 시소러스의 기본적인 용어관계 범주인 동등, 계층, 연관관계로 요약될 수 있음을 밝혔다. 실제로 용어관계가 나타나는 다양한 사례들을 망라적으로 수집하여 용어관계의 유형을 분석하였다. 크게 언어학의 용어관계, 시소러스 구축 표준, 용어관련 표준, 용어관계에 대한 이론적 연구, 각종 시소러스, 의미망과 온톨로지, 그리고 데이터베이스의 용어관계 등 총 7개의 범주로 나누어 용어관계를 다루는 다양한 이론적.실제적 사례 29종을 선정하고 이에 등장하는 용어관계를 시소러스의 기본 세 가지 관계유형으로 범주화하였다. 이를 통계적인 측면과 내용적인 측면에서 분석한 결과, 용어관계 중에는 동등관계와 계층관계보다 특히 연관관계의 유형이 가장 다양하고 뚜렷한 설정 기준이 없어 가장 문제시 되는 유형임을 파악할 수 있었다. 따라서 분류 모형 설계에 있어서 동등관계와 계층관계에 대해서는 이론과 실제 사례 분석을 바탕으로 제안을 하였고, 연관관계에 있어서는 하위 분류에 관한 일련의 실험을 설계하고 수행하였다. 둘째, 앞선 분석을 바탕으로 용어관계의 분류 모형을 설계하고 분류 방안을 제시하였다. 분류 모형 설계를 위해 먼저 분류의 기준과 원칙을 마련하고, 모형의 범위와 작성방법 및 체제를 밝혔다. 분류 모형 설계에 있어서 동등관계는 완전동의관계와 유사동의관계, 반의관계로 분류하였다. 그 중 유사동의관계는 다른 관계유형에 비해 언어학적 의미 분석이 중요하므로 언어학적 분석에 의해 마련된 패싯 구분을 도입하는 방안을 제시하였다. 계층관계 중에서 전체/부분관계는 그 하위유형이 다양하고 연관관계와의 구분도 쉽지 않으므로 보다 정밀한 분석이 필요하다고 판단하였다. 사례분석을 기반으로 하여 전체/부분관계는 물리적 관계와 비물리적 관계로 양분하고, 물리적 관계는 신체기관과 일반 사물로, 비물리적 관계는 지리적 위치와 학문분야, 계층적 조직으로 구분하는 분류 모형을 제안하였다. 이러한 계층적 전체/부분관계의 분류 모형은 하위에서 더욱 세분되었다. 또한 계층관계를 갖는 개별 용어쌍에서 하위 범주화를 위해 사용할 수 있는 패싯지시어의 모형을 제시하였다. 연관관계에 대해서는 이제까지 가장 많은 문제가 제기되어 왔으나, 의미적인 측면에서의 해결책은 패싯 분석 한가지로 나타났다. 패싯 분석의 적용에 대한 실제적인 방법론이나 효과를 밝혀낸 사례는 거의 없는 것으로 파악되었으므로 본 연구에서는 연관관계에 대한 패싯 분석의 가능성과 한계를 실험으로 밝혔다. 실험을 위한 용어 데이터로는 대상 시소러스 간의 비교를 위해 고등학교 교과서에 나오는 색인 용어를 선택하였다. 또한 주제영역별 비교를 위해서는 인문사회 분야에서는 ‘Social Problems’ 범주에 해당하는 디스크립터를 <국립중앙도서관 주제명표목표>와 Unesco thesaurus : a structured list of descriptors for indexing and retrieving literature in the fields of education, science, social science, culture and communication에서 추출하여 연관관계의 패싯 분석을 시도하였다. 자연과학 분야에서는 ‘Plant Pathology’ 분야의 용어를 AGROVOC Multilingual Agricultural Thesaurus에서 검색하여 이의 연관관계를 패싯 분석하였다. 연관관계의 분류를 위한 패싯 범주는 패싯 구분의 특정성이 미치는 영향을 파악하기 위하여 두 가지의 패싯 모형을 마련하였다. 하나는 본 연구의 사례 분석 결과와 한상길(1999)의 연구에서 제안된 패싯을 결합하여 마련한 범영역적인 목적으로 작성된 패싯이고, 다른 하나는 UMLS(Unified Medical Language System) 의미망의 용어관계를 패싯의 속성으로 삼아 마련한 특정성이 강한 패싯이다. 각각의 실험 용어군에 패싯 모형을 적용한 실험의 결과, 인문사회 분야의 용어는 용어가 가진 의미와 관계 양상이 다양한 방식으로 해석 가능하고 그 의미적 범위가 애매한 경우가 많아 대범주 수준의 패싯 분석은 가능하나 하위 수준의 패싯 적용에 있어 상당한 어려움이 있었다. 반면 자연과학 분야의 용어는 영역에 특정적인 패싯을 적용한 결과 연관관계의 분류가 가능했다. 따라서 이러한 점을 고려하여 패싯 분석의 사례들을 망라적으로 통합하고 분석한 결과를 바탕으로 연관관계 분류를 위한 패싯지시어의 집합 모형을 제시하였다. 이상과 같은 이론적 분석과 실제 용어관계의 유형 분석에 따른 용어관계 분류의 결과, 계층관계와 연관관계의 패싯 분류는 구별되어 논의되어야 한다는 점, 연관관계의 분류에서 역관계의 설정이 필요하다는 점, 용어관계 분류에 있어 용어관계의 실제 출현 양상이 우선시 되어야 한다는 점, 패싯 분석이외에 중요도와 범주에 의한 분류의 방법을 제안하였다. 셋째, 이러한 용어관계 분류 모형의 대표적인 활용 방안으로는 정보검색을 지원하고, 용어관계의 레지스트리로서 역할을 하고, 전체 언어체계로의 확장 기반이 되며, 한글자원의 표준 기반이 될 수 있다는 점을 제시하였다. 패싯 분석의 적용과 관련해서는 패싯 분석을 용이하게 할 수 있는 지원이 필요함을 밝히고, 그 방안을 제안하였다. 또한 향후 용어관계 연구에 있어서는 학제적 협력의 강화가 필요함을 밝히고, 특히 문헌정보학 영역에서 주도해야 할 연구분야를 밝혔다. 마지막으로 향후의 연구 과제로는 용어관계 판정의 자동화, 용어관계와 관련된 타 학문 분야의 연구 결과의 적용, 기존의 세 가지 용어관계 유형을 유지할 필요와 효과에 관한 논의가 이루어져야 함을 제안하였다.