DSpace at EWHA: 웹 서비스를 이용한 바이오 서열 정보 데이터베이스 및 통합 검색 시스템 개발

Browse

My Repository

DSpace at EWHA과학기술대학원 컴퓨터학과 Theses_Master

View : 1003 Download: 0

웹 서비스를 이용한 바이오 서열 정보 데이터베이스 및 통합 검색 시스템 개발

Title: 웹 서비스를 이용한 바이오 서열 정보 데이터베이스 및 통합 검색 시스템 개발

Authors: 이수정

Issue Date: 2003

Department/Major: 과학기술대학원 컴퓨터학과

Publisher: 이화여자대학교 과학기술대학원

Degree: Master

Abstract: 최근, 바이오 관련 장비, 기술들이 발전함에 따라, 바이오 관련 데이터나 그것을 제공하는 호스트들이 급속하게 증가하고 있다. 또한 이러한 데이터들은 바이오 연구 개발 커뮤니티들의 수만큼, 분산되고 이질적인 면을 가지고 있어서 바이오 관련 데이터베이스의 통합과 연동기능의 제공이 중요한 문제가 되고 있다. 현재까지 많은 연구가 진행되고 있으나, 대부분의 통합 시스템이 링크(Cross-Reference), 데이터 웨어하우징 구축을 기반으로 하고 있어서 데이터 스키마나 데이터의 변경시, 실시간 업데이트와 같은 문제점을 가지고 있다. 이러한 비효율적인 면을 개선시키고자, 플랫폼, 스키마의 변화에 구애 받지 않고 서비스를 가능하게 하는 웹 서비스 기술을 이용한 통합 시스템이 제안되고 있다. 본 논문에서도 이러한 흐름에 따라, 웹 서비스 기술을 기반으로 한 바이오 서열 데이터의 통합 검색 시스템을 개발하였다. 우선, 생물 종을 벼와 돼지로 한정한, 핵산, 단백질 서열 데이터를 BSML을 포함한 다양한 포맷으로 제공할 수 있는 데이터베이스를 구축하였다. 그리고WSDL, SOAP, UDDI와 같은 웹 서비스 기술을 이용하여 구축한 데이터베이스의 검색 모듈을 웹 서비스화하여 공개하고, 외부 데이터베이스를 웹 서비스 병렬 처리를 통해 통합 검색 할 수 있도록 하였다. 또한 개발된 시스템은 사용자의 질의시 얻고자 하는 정보가 구축된 데이터베이스에 존재하지 않을 경우, 외부 데이터 베이스를 검색을 하고 그 결과를 자동적으로 구축된 데이터베이스에 저장되도록 한다. 앞으로 웹 서비스 기술이 더욱 발전되고 확산됨에 따라, 웹 서비스화 된 데이터베이스 호스트가 많아진다면, 바이오 분야에서도 분산된 데이터베이스의 연동성을 높일 수 있을 것이다. ;Recently, the rapid development of biotechnology brings the explosion of biological data and biological data host. Moreover, these data are highly distributed and heterogeneous, reflecting the distribution and heterogeneity of the molecular biology research community. As a consequence, the integration and interoperability of molecular biology databases are issue of considerable importance. There are many researches that have carried out on integrating and analyzing biological data to reduce the burden of accessing heterogeneous data sources and integrating the results. But, up to now, most of the integrated systems such as link based system, data warehouse based system have many problems which are keeping the data up to date when the schema and data of the data source are changed. For this reason, the integrated system using web service technology that allow biological data to be fully exploited have been proposed. In this paper, we built the integrated system of the bio sequence information based on the web service technology. First, we have constructed the relational database of rices and pigs and implemented nucleotide and protein sequence retrieval modules of the biological data such as BSML, GenBank, Fasta format. Then, as the service provider, we published public SOAP service of the retrieval modules using WSDL, SOAP, UDDI. Also, the retrieval modules of the external database proceed in parallel. And data from the external database is loaded to the constructed database automatically through the parsing processing. The developed system allows users to traverse heterogeneous and disparate data resources through the Web Service technology and provide integrated easy-to-use user interfaces.