DSpace at EWHA: Development of computational methods for identifying cancer biomarkers from high-throughput data

Browse

My Repository

DSpace at EWHA일반대학원 생명과학과 Theses_Ph.D

View : 1240 Download: 0

Development of computational methods for identifying cancer biomarkers from high-throughput data

Title: Development of computational methods for identifying cancer biomarkers from high-throughput data

Authors: 전유경

Issue Date: 2015

Department/Major: 대학원 생명과학과

Publisher: 이화여자대학교 대학원

Degree: Doctor

Advisors: 이상혁

김완규

Abstract: Cancer is a complex disease that arises as a failure of cell growth and differentiation regulation due to genomic changes. Diverse types of genomic aberrations are known including copy number variations (e.g., amplifications and deletions), mutations, genome rearrangements, and epigenetic changes. Recent advances in high throughput technologies such as microarray and next generation sequencing (NGS) led to revolutionary changes in the field of cancer genome research by producing huge amount of data at reasonable cost. For example, The Cancer Genome Atlas (TCGA), one of the representative cancer genome consortia, generated multi-platform genomic data for over 500 patients in each tumor type across over 20 tumor types. As the sequencing cost become further reduced, many individual hospitals or laboratories are also generating genomic data for their own cancer patients. Aggregating and analyzing this huge amount of data is a big challenge even with the state-of-the-art computational techniques. One of the most important goals in cancer genome analysis is to identify the biomarkers for early detection, precise diagnosis of subtypes, and monitoring treatment results, which can be accomplished via mechanistic understanding tumorigenesis and development. Typical genomic studies often results in a number of genomic alterations, and it is critical to identify driver aberrations among passenger mutations. Proper bioinformatics tools, databases and algorithms are essential for successful interpretation of genomic data. In this study, we developed two update versions of databases for functional investigation of microRNAs (miRNAs), which are known to be important regulators and biomarkers in cancer. miRGator v2.0 integrated a variety of publicly available microarray data sets related to miRNAs. This database provides association between groups of miRNA-related genes using a network visualization tool in an intuitive fashion. miRGator v3.0, the latest update version, consolidated the deep sequencing data and implemented several novel tools to explore massive data. The miR-Seq browser provides sequence alignment with secondary structure information, which can be readily identified with the related miRNA editing and modifications. Both databases provide means to explore the miRNA–target mRNA relationships and their expression correlation simultaneously to infer molecular function of miRNAs via target mRNAs. Dysregulation of miRNA expression is regarded as a promising diagnostic and prognostic marker in various types of cancer. We propose a statistical method for integrating miRNA and mRNA expression profiles with miRNA–target mRNA relationships. The proposed approach was applied to the TCGA glioblastoma multiforme (GBM) expression profiles, and 37 miRNAs were predicted to be functional in GBM. For ten candidate miRNAs, we tested their influence on cell proliferation using 12 GBM cell lines and obtained five miRNAs exhibiting reduced proliferation in multiple cell lines. This indicates that our method of predicting disease-related miRNAs by combining gene expression profiles with target information yielded highly probable miRNAs that may include many novel ones. Identification of oncogenes or tumor suppressors is always an important task in cancer research. Using gene expression data of our own and TCGA colon adenocarcinoma (COAD) cohort, we have identified SLC22A18 gene as a candidate tumor suppressor, being down-regulated in most of colorectal cancer (CRC) patients. We further demonstrated experimentally that SLC22A18 inhibits colony formation and induces G2/M arrest, consistent with a tumor suppressor activity. The survival data from the TCGA showed worse long-term survival for patients with low level expression of SLC22A18, which may emphasize its functional roles in vivo and clinical value as a prognostic marker. Emergence of drug resistance is the most critical cause of treatment failure in cancer. Developing alternative medicine to cure refractory tumor usually requires a good grasp of molecular mechanisms, which is often hampered by difficulties in obtaining drug-resistant cells. We developed a microfluidic chip system of accelerating tumor cell evolution. The chip is composed of ~500 connected hexagonal compartments to provide continuous concentration gradients of nutrient and anticancer drug. We further demonstrated that resistant cell lines developed within a week by growing the U-87 MG cell line, from a glioblastoma patient, on doxorubicin media. Subsequent exome sequencing identified 61 candidate mutations responsible for resistance development. Gene ontology terms for molecular function are in excellent agreement with the previous knowledge. Analyzing the mutation and expression data from deep sequencing, we have identified three mechanisms of resistance development. For important candidates, we verified consequences of loss-of-function variants by siRNA knockdown experiments. This combination of nanochip and deep sequencing technologies offers a promising platform for overcoming cancer drug resistance.;암은 유전체 변이에 의해 세포 성장 및 분화 조절이 실패하여 발생하는 복잡한 질병이다. 복제 수 변이 (증폭 및 결실), 돌연변이, 유전체 재배열, 후성유전학적 변화를 포함한 다양한 종류의 유전체 변이가 알려져 있다. 최근 마이크로어레이 및 차세대 시퀀싱과 같은 대량 분석 기술의 발전은 적정한 비용으로 대용량 데이터를 생성하게 함으로써 암 유전체 연구 분야에서 혁신적인 변화를 가져왔다. 대표적인 암 유전체 컨소시엄인 TCGA는 20개 이상의 암종에 대해 각 암종별 500명 이상의 환자를 대상으로 다차원 유전체 데이터를 생산하였다. 시퀀싱 비용이 점차 감소됨에 따라 많은 병원이나 연구실 단위에서도 암 환자의 유전체 데이터를 생산하고 있다. 이러한 대용량의 데이터를 취합하고 분석하는 것은 첨단 계산기법으로도 쉽지 않은 일이다. 암 유전체 연구의 가장 중요한 목표 중 하나는 암의 조기 발견을 위한 바이오마커의 확립, 서브타입의 정밀 진단, 치료 결과의 관찰이며 이는 종양 형성 및 발생 기전의 이해를 통해 달성될 수 있다. 일반적인 유전체 연구의 결과로 흔히 많은 수의 유전적 변이를 얻게 되는데 우연에 의해 일어난 일과성 변이 중에서 정상 세포를 암으로 전이시키는 변이를 구별하는 것이 중요하다. 유전체 데이터의 성공적인 해석을 위해서는 적절한 생명정보학 분석도구, 데이터베이스, 알고리즘이 필수적이다. 본 연구에서는 암의 중요한 조절인자이자 바이오마커로 알려진 miRNA의 기능 연구를 위한 데이터베이스의 업데이트 버전을 두 차례에 걸쳐 구축하였다. miRGator v2.0은 miRNA와 관련된 다양한 공개된 마이크로어레이 데이터를 체계적으로 통합하였고, 네트워크 시각화 도구를 통해 miRNA와 관련된 유전자 그룹 간의 관련성을 직관적으로 제공하였다. 최신 업데이트 버전인 miRGator v3.0은 시퀀싱 데이터를 통합하고 대규모 데이터의 탐색을 위해 몇 가지 새로운 툴을 구현하였다. miR-Seq 브라우저는 이차 구조와 함께 짧은 서열 정렬 정보를 제공하여 miRNA 교정 및 변형 등의 miRNA 다양성을 확인할 수 있다. 두 데이터베이스 모두 miRNA와 타겟 유전자 간의 관계 및 그들의 발현의 상관관계를 분석하여 타겟 유전자를 통한 miRNA의 기능을 추론하는 수단을 제공한다. miRNA의 이상발현은 다양한 암종에서 유망한 진단 및 예후 표지자로 여겨지고 있다. miRNA와 mRNA의 발현 프로파일과 miRNA와 타겟 유전자의 관계 정보의 통합을 위한 통계적 방법을 제안하였다. 이를 TCGA 뇌종양의 발현 데이터에 적용하여 37개의 miRNA가 뇌종양에서 기능이 있을 것으로 예측하였다. 그 중 10개의 후보 miRNA에 대해서 12개 뇌종양 세포주에서 세포 증식에 대한 영향을 시험하여 여러 세포주에서 증식이 감소되는 5개의 miRNA를 얻었다. 발현 프로파일과 타겟 정보를 결합한 질병 관련 miRNA를 예측하는 분석 방법은 많은 신규 miRNA를 포함한 유력한 miRNA를 얻을 수 있음을 보여주었다. 종양 유발 유전자 또는 종양 억제 유전자의 식별 또한 암 연구에서 중요하다. 자체 생산 데이터 및 TCGA 대장암 발현 데이터 분석을 통해 대부분의 대장암 환자에서 낮은 발현을 보이는 SLC22A18 유전자를 후보 종양 억제 유전자로 찾았다. 나아가 SLC22A18 유전자가 종양 억제 기능과 일치하게 콜로니 형성을 억제하고 G2/M 세포 주기의 억제를 촉진하는 것을 실험적으로 증명하였다. TCGA 대장암 환자의 생존 데이터에서 낮은 SLC22A18 발현을 가지는 환자들이 나쁜 장기 예후를 나타낸다는 사실을 통해 생체 내에서의 기능적 역할과 예후 표지자로서의 임상적 가치를 보여주었다. 약물 내성 발생은 암 치료가 실패하는 가장 중요한 원인이다. 불응성 암의 치료를 위한 대체 약물의 개발을 위해서는 분자생물학적 기전에 대한 이해가 필요한데 이는 약물 내성 세포의 확보가 어렵다는 점에서 저해를 받는다. 약 500개의 연결된 육각형 구획으로 구성되어 있으며 영양분과 항암제의 연속적인 농도구배를 통해 암 세포의 진화를 촉진하는 미세유체 칩 시스템을 개발하였다. U-87 MG 뇌종양 세포주를 doxorubicin 항암제 조건에서 배양하여 내성이 한 주 내에 획득됨을 입증하였다. 유전체 시퀀싱을 통해 61개의 돌연변이를 확인하였으며 분자생물학적 기능에 대한 유전자 온톨로지 분석을 통해 기존 지식과 일치하는 결과를 얻었다. 또한 유전체 및 전사체 시퀀싱 데이터 분석을 통해 세 가지 doxorubicin 내성 메커니즘을 밝혀냈으며 주요 후보 유전자에 대해서는 siRNA knockdown 실험을 통해 기능 상실 돌연변이를 검증하였다. 이러한 나노칩과 시퀀싱 기술의 조합은 약물 내성을 극복할 수 있는 유망한 플랫폼을 제공할 것이다.