View : 52 Download: 0

Integrative network analysis of omics data for inferring functional elements

Title
Integrative network analysis of omics data for inferring functional elements
Authors
서지혜
Issue Date
2012
Department/Major
대학원 생명·약학부생명과학전공
Publisher
이화여자대학교 대학원
Degree
Doctor
Advisors
이상혁김완규
Abstract
Biological pathway 데이터 통합을 위해 먼저 널리 알려져 있으면서 성질이 다른 KEGG, BioCarta, Reactome, NCI-Nature Curated 데이터베이스로부터 human pathway 데이터를 수집하여 서로 성격이 다른 이들 데이터베이스의 정보를 손실 없이 통합하기 위한 데이터베이스 모델을 만들었다. 그리고 Pathway level과 Entity level로 분류하여 통합 데이터 베이스를 구축하였다. 또한 이를 효율적으로 이용할 수 있는 built-in pathway 가시화 모듈을 자체 개발하여 complex 네트워크의 연구를 가능하도록 하였다. Pathway 데이터 외에도 많은 양의 서로 연결된 복잡한 생물학 데이터를 효과적으로 분석하고 생물학적 가설을 세우는 데 있어, 적절한 데이터 분석 방법들을 제공하는 통합된 네트워크 가시화를 위해 MONGKIE (MOdular Network Generation and visualization platform with Knowledge Integration Environment) 라는 네트워크 분석 및 가시화 플랫폼을 개발하였다. 우선적으로 다양하고 서로 다른 생물학적 개체와 그들 사이의 관계를 네트워크 구조로 모델링하여 가시화할 수 있도록 잘 정의되고 광범위한 가시화 모델을 만들었다. 그리고 다양한 생물학 데이터의 네트워크 분석 및 가시화를 위한 통합 환경을 제공하고자 genetic 또는 physical interaction, regulatory 네트워크 등의 interaction 네트워크 및 gene-drug-disease association데이터와 같은 네트워크 구조로 모델링 될 수 있는 어떠한 생물학 데이터도 적용할 수 있도록 개발하였다. 특히 바이오 분야에 특화된 다양한 종류의 노드(바이오 개체) 및 에지(관계)의 가시화가 가능하도록 하고, 통합된 visual editor UI, data-to-visual 맵핑기능, 다양한 레이아웃 기능을 넣어 생물학자 개개인의 관점에 맞추며 효율적으로 데이터를 가시화하는데 도움이 될 수 있도록 하였다. 단순한 가시화 기능뿐만 아니라 네트워크 데이터의 분석이 가능하게 함으로써 많은 양의 데이터로부터 의미 있는 정보를 추출하는데 도움을 주도록 개발하였다. 특정 생물학적 context상에서 네트워크 분석을 통해 중요한 생물학적 의미를 찾아내기 위해 Human의 Embryonic stem cell(hESC)와 Neural Stem Cell(hNSC)의 네트워크 분석을 수행하였다. ChIP-chip 데이터를 통해 두 세포에 중요한 역할을 할 것으로 여겨지는 HMG protein SOX2의 Target을 예측하고, 그들의 기능분석을 수행하였다. 또한 Target들의 프로모터분석을 통해 SOX2의 cofactor들을 예측하고 이들 사이의 네트워크를 형성하였다. Network clustering 분석을 통해 hESC, hNSC에서 cell type specific한 네트워크 cluster를 추출하였으며, 발현 데이터를 활용해 발현 여부를 확인하였다. 또 다른 생물학적 context인 Non-small cell lung cancer(NSCLC) adenocarcinoma를 가진 비흡연 여성환자에서 유래한 matched cell line pair과 여섯 명의 환자의 조직에서 genome에서 proteome에 이르는 많은 다른 ‘ome’ 레벨의 데이터를 본인이 속한 기관인 KOBIC(Korean Bioinformation Center)와 ESCSB(Ewha Research Center for Systems Biology)에서 생산하여 Deep sequencing 기술을 이용하여 세포과정의 전체적인 자세한 분석을 수행하게 되었다. 본 연구에서는 multi-omics 데이터로부터 나온 최종 산물들에 대한 생물학적 네트워크를 잘 반영하는 알고리즘인 clustering coefficient에 기반한 vertex weighting scheme을 적용한 MCODE를 이용해 다양한 네트워크 분석을 통해 NSCLC 특이적인 생물학적 의미 도출을 하였다. ;In the recent years, high-throughput studies of biological systems, including large-scale "OMICS"-approaches such as genomics, proteomics and transcriptomics, have been resulting in a greatly increased volume of complex and inter-connected biological data. In a biological context, many different types of relationships can be measured, such as physical interaction or genetic interactions. When large collections of diverse relationships are generated from several different high-throughput experimental analyses of a single biological system, integrated network analysis can prove particularly useful for inferring functional elements. One of the biggest challenges in the study of biological regulatory networks is the systematic organization and integration of complex interactions taking place within various biological pathways. Currently, the information of the biological pathways is dispersed in multiple databases in various formats. hiPathDB is an integrated pathway database that combines the curated human pathway data of NCI-Nature PID, Reactome, BioCarta and KEGG. hiPathDB provides two different types of integration. The pathway-level integration, conceptually a simple collection of individual pathways, was achieved by devising a gene-centric pathway model that takes distinct features of four databases into account and subsequently reformatting all pathways in accordance with our model. The entity-level integration creates a single unified pathway that encompasses all pathways by merging common components. Even though the detailed molecular-level information such as complex formation or post-translational modifications tends to be lost, such integration makes it possible to investigate signaling network over the entire pathways and allows identification of pathway cross-talks. Another strong merit of hiPathDB is the built-in pathway visualization module which supports explorative studies of complex networks in an interactive fashion. The layout algorithm is optimized for virtually automatic visualization of the pathways. Given the huge amount of biological data and the heterogeneity, it is a major challenge to extract meaningful information from the omics-scale data and use them to answer some of the biological questions in an insightful manner. Therefore integrated network visualization together with data analysis methods is a key aspect of both the understanding and analysis of large-scale inter-connected biological data. We present MONGKIE (MOdular Network Generation and visualization platform with Knowledge Integration Environment), which is an integrated network visualization and analysis platform which allows us to explore and analyze inter-connected biological data in an interactive manner with knowledge integration environment. Although it is optimized for exploring interaction networks, such as genetic or physical interaction, regulatory network and pathways tightly integrated with hiPathDB. It can be easily applied to any biological data which can be modeled as network structures, such as gene-drug-disease association network by utilizing data integration methods provided by MONGKIE. To represent diverse types of biological entities and relationships, MONGKIE supports various domain-specific types of nodes and edges. Also users can customize visual properties of them through the integrated visual editor UI, and can set visual aspects of nodes and edges based on their attribute values by data-to-visual mapping. MONGKIE is designed for both the visualization of biological networks and the analysis of these networks with a seamless integration between the two procedures. MONGKIE incorporates knowledge integration and network analysis modules in the platform, such as Import/Export interaction network, interaction manager, gene ID conversion, expression overlay, network clustering, gene set enrichment analysis, Pathway Integration and Visualization. MONGKIE is a java-based application built on top of NetBeans platform that supports modular(plug-in) architecture, thus being platform-independent and easily extendable with additional functionalities with little programming effort. In this research, I was to gain biological insight from multi-omics data obtained by high-throughput experiments for the specific stem cell types and certain cancers by performing network analysis of these data. It is well known that the transcriptional factor SOX2 is an essential TF for early development as well as for the propagation of undifferentiated embryonic stem cells(ESC). In addition, SOX2 has an essential role in development of neural stem cell(NSC). In an effort to elucidate the difference in regulatory mechanisms of SOX2 in ESC and NSC, we performed the ChIP-chip experiment to identify SOX2 target genes in human NSC. The result was compared to the equivalent data in human ESC publicly available. Target genes were significantly different between ESC and NSC. Gene set analysis showed that target genes were enriched in different categories of GO. We hypothesized that there must be cell type specific cofactors for Sox2, and it’s verified using random permutation test. Several transcription factors which are well-known essential factors in ESC, NSC are included in those cell type specific cofactors. We also constructed cell type specific Sox2 target gene networks and expanded it using Protein-Protein interactions with cofactors. And then, network clustering analysis was performed. Cell type specific clusters were observed. In ESC network cluster, TGF beta signaling highly related with SMAD3, SMAD4 and SP1 was significant. Whereas, in NSC network cluster, several genes(CREB, TP53, HDAC1, YY1, SMAD2, and STAT3) popped out and it looks like EP300 plays role as a network hub. Finally, we have analyzed the gene expression profiles of cell type specific network clusters. The result would provide useful information to understand the role of SOX2 in differentiation of ESC to NSC. In additional study, I have analyzed the network at many different ‘ome’levels of genome to proteome in a matched cell line pair and patient sample derived from a never-smoker female patient with non-small cell lung cancer(NSCLC) adenocarcinoma using deep sequencing techniques. I examined their functional significances in the perspective of regulatory elements and networks and identified many modules that would play critical roles in cancer.
Fulltext
Show the fulltext
Appears in Collections:
일반대학원 > 생명·약학부 > Theses_Ph.D
Files in This Item:
There are no files associated with this item.
Export
RIS (EndNote)
XLS (Excel)
XML


qrcode

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

BROWSE