DSpace at EWHA: Prediction of MicroRNA Relevance to cancer based on Gene Set Analysis

Browse

My Repository

DSpace at EWHA일반대학원 생명·약학부 Theses_Master

View : 628 Download: 0

Prediction of MicroRNA Relevance to cancer based on Gene Set Analysis

Title: Prediction of MicroRNA Relevance to cancer based on Gene Set Analysis

Authors: 서채화

Issue Date: 2010

Department/Major: 대학원 생명·약학부생명과학전공

Publisher: 이화여자대학교 대학원

Degree: Master

Advisors: 이상혁

Abstract: MicroRNAs (miRNAs), a family of small non-coding RNAs of 20-22 nt in length, are an important class of regulators that are involved in diverse cellular processes such as developmental control, apoptosis, cell differentiation and proliferation. Their expression patterns in cancer are well studied but their roles in tumorigenesis, development, and metastasis are relatively unexplored partly due to the difficulties in identifying target genes experimentally. Identification of target genes is mostly based on predictive algorithms that take advantage of the sequence complementarity. However, these programs tend to produce several hundreds of target genes per miRNA including many false positives. Thus, it is difficult to predict the roles of miRNAs solely based on target gene prediction. Gene set analysis (GSA) is an alternative method that analyzes the set of genes rather than individual gene using statistical methods. In this study, we applied the gene set analysis method to infer the cancer-relatedness of microRNAs. Gene set analysis requires two gene sets - target genes of miRNAs and differentially expressed genes in various types of cancer in this study. MicroRNA-regulated genes were deduced in two ways. First, we used the target prediction programs available in public (MICROCOSM, TargetScan, PITA, and miBridge). These are direct targets of miRNAs even if many false positives might be included. Secondly, we tried to include the indirect targets since there might be important regulators in downstream genes of miRNA targets as well. For this network expansion, we obtained the downstream genes of experimentally validated targets in miRecords to limit the number of target genes in practice. Pathway, transcription regulation, and protein-protein interaction networks were used. The lists of differentially expressed genes (DEGs) in various types of cancer were obtained from the ONCOMINE database where DEGs were organized according to the study ID. Two-way GSAs were performed to test whether the targets of a miRNA were enriched/depleted in a specific DEGs of cancer study, and vice versa. Fisher exact test with false discovery rate correction was applied in the statistical test. To enhance the predictive power, we also analyzed the common targets of various prediction programs. All miRNA-study pairs were stored into a database. Several miRNA-study pairs were statistically significant regardless of the prediction programs used. In an effort to find the validation proofs for our computational results (relevance of miRNA targets and cancer DEGs), we examined the miRNA expression profiles in various cancers using the miR2Disease and PhenomiR databases. In addition, the cancer-relatedness of target genes were examined using the gene-disease databases such as OMIM, GAD, IPA (Ingenuity Pathway Analysis), and cancer pathways in KEGG. All these results are being integrated as a web-based database. Deduced miRNA-cancer associations are expected to provide important clues for cancer research community.;MicroRNA는 20-22 개의 nucleotide로 이루어진 small non-coding RNA로서, cell proliferation, development 등의 biological process를 조절하는 것으로 알려져 있다. 이미 여러 연구를 통하여 특정 cancer에서의 miRNA 발현패턴 및 target 유전자가 일부 알려져 있기는 하지만 암의 발생, 진행, 전이 과정에서 miRNA의 역할은 여전히 밝혀진 것이 많지 않다. MiRNA의 기능 규명을 위하여 target 유전자를 아는 것이 중요한데 실험적 규명이 쉽지 않기 때문에 서열의 상보성을 이용한 target 예측 프로그램을 사용한다. 그러나 이런 프로그램들은 miRNA 당 수십 에서 수백 개의 target 유전자를 예측하고 있으며, 그 중에는 false positive도 많이 포함되어 있기 때문에 target 유전자를 통한 miRNA의 기능 예측은 쉽지 않은 작업이다. 이런 false positive에 의한 noise를 줄일 수 있는 방법으로 유전자를 하나씩 분석하는 것이 아니라 유전자의 집합을 분석하는 Gene Set Analysis (GSA) 방법을 사용할 수 있다. 본 연구에서는 miRNA의 암 연관성을 규명하기 위하여 miRNA의 target 유전자들과 암에서 차등 발현되는 유전자 목록 사이의 관계를 GSA 방법으로 분석하였다(그림 6). MiRNA 조절의 영향을 받는 유전자 목록은 두 가지 방법을 사용하였다. 먼저 MiRNA의 target 유전자를 프로그램으로 예측하는 방법이 가능한데, 본 연구에서는 공개되어 있는 유명한 프로그램(MICROCOSM, TargetScan, PITA, miBridge)을 함께 사용하였다. 두 번째는 miRNA의 직접적 target 유전자뿐만 아니라 downstream에 있는 간접적 target까지 포함하는 방법으로, 프로그램으로 예측된 결과를 사용할 경우 너무 숫자가 커지기 때문에 실험적으로 검증된 miRNA target에 대하여만 pathway, transcription regulation, protein-protein interaction 등의 분자적 네트워크를 이용하여 확장하였다. 암에서 차등 발현되는 유전자 목록은 암 관련 microarray 자료를 모아서 재분석한 ONCOMINE 데이터베이스를 이용하였는데, study 별로 정리되어 있는 차등 발현 유전자 목록을 모두 다운로드 받았다. 이렇게 모은 miRNA의 조절을 받는 유전자 집합과 다양한 암에서 차등 발현되는 유전자 집합에 대하여 Fisher exact test 및 false discovery rate 방법으로 gene set analysis를 적용하여 통계적 유의한 miRNA-study pair를 구하고 정리하였다. 예측의 신뢰도를 높이기 위하여 GSA를 두 방향 모두 적용하였고, 이들 프로그램의 공통 target 유전자만 모아서 분석하거나 (common), 사용한 예측 프로그램에 관계없이 항상 유의하게 나오는 miRNA-study pair는 combine의 항목으로 별도로 모아 두었다. 위의 방법으로 구한 miRNA-study pair의 연관성 결과에 대하여 두 가지 방법으로 보조증거를 찾아 보았다. 먼저 miRNA의 암에서 발현 자료를 miR2Disease와 PhenomiR 데이터베이스를 통하여 확인하였다. 또한 target gene으로 예상되는 유전자가 암에 관련되어 있는지에 대한 증거를 질병관련 유전자 DB인 OMIM, GAD, KEGG의 암 관련 pathway, Ingenuity 사의 질병 유전자 자료에서 찾아 보았다. 연구 결과를 통해 miRNA의 암 연관성 유추에 유용하게 활용될 수 있을 것이다.