View : 515 Download: 0

논문 데이터 마이닝을 이용한 질병관련 유전자의 발굴

논문 데이터 마이닝을 이용한 질병관련 유전자의 발굴
Issue Date
대학원 분자생명과학부
이화여자대학교 대학원
Human genome project 의 완성으로 LocusLink 나 OMIM 같은 respective databases 들이 이미 human genome region에 mapping이 되어 있으나 알려지지 않은 candidate gene들을 찾아내기 위해서 다양한 주제별 전문가들에 의해 정리, 분류되어있고MeSH 라는 고유의 검색용어로 인덱스 되어진 MEDLINE에 annotation되어 있는 controlled vocabularies들을 사용해서 질병과 관련있는 MeSH C terms 과 chemistry를 표현하는chemical terms 사이의 연관정도,. chemical terms 과 protein function 의 연관정도, pathological conditions 과 protein-fuction terms 사이의 연관정도를 계산한 후 이 결과를 가지고medical terms 을 protein-fuction 과 관련 짓는 것이다. 그 결과OMIM 상의 특정 질병에 대하여 GO 상의 어떤 항이 많은 관련이 있는지, 또한 RefSeq 의 어떤 유전자가 관련 가능성이 높은지를 판단 할 수 있는 scoring system을 개발하였다. 이를 이용하여 염색체의 특정 locus 에 존재하는 질병 관련 유전자를 구체적으로 찾는 방법을 개발하였고 다양한 질병에 대하여 그 결과를 볼 수 있는 웹사이트를 구축하였다.;Although many inherited diseases currently recorded in respective databases (LocusLink, OMIM) are already linked to a region of the human genome, about many have no known associated gene. The public availability of the human genome draft sequence has fostered new strategies to map molecular functional features of gene products to complex phenotypic descriptions, such as those of genetically inherited diseases. Owing to recent progress in the systematic annotation of genes using controlled vocabularies, we have developed a scoring system for the possible functional relationships of human genes to genetically inherited diseases that have been mapped to chromosomal regions without assignment of a particular gene. To support and rationalize the manual association of known or inferred functional features of genes to the phenotypic features The first phase of the data-mining process involves combining the information from MEDLINE and a protein sequence database to derive relationships between pathological conditions and terms describing protein function. We used a three-step procedure. The first we computed the associations between pathological conditions and chemical terms using MEDLINE, a database of indexed journal citations and abstracts of the biomedical literature, which currently contains more than 12 million entries and MeSH classified by experts in each field . We consider the relationship between associated terms as strong if they occur together in many abstracts. The second we calculated the relationships between chemical terms and terms describing protein function. We used the NCBI RefSeq database, which contains more than 15,000 genes whose function is annotated with terms from a controlled functional vocabulary.The third we combined the associations of functional terms to chemical terms with the previously established associations of pathological conditions to chemical terms, to derive the aforementioned relations between pathological conditions and protein-function terms. As a result an scoring system has been developed which find the specific relationship between GO terms and an OMIM_based disease. The system was used todiscover disease-related genes in a locus and also the results for varies diseases are shown in the website(∼hera).
Show the fulltextShow the fulltext
Appears in Collections:
일반대학원 > 생명·약학부 > Theses_Master
Files in This Item:
There are no files associated with this item.
RIS (EndNote)
XLS (Excel)


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.