DSpace Collection:

DSpace Collection: https://dspace.ewha.ac.kr/handle/2015.oak/251401 Sun, 05 Apr 2026 03:51:24 GMT 2026-04-05T03:51:24Z Transcriptome-Based Patient Stratification for Predicting Immunotherapy Response in Lung Adenocarcinoma https://dspace.ewha.ac.kr/handle/2015.oak/270809 Title: Transcriptome-Based Patient Stratification for Predicting Immunotherapy Response in Lung Adenocarcinoma Ewha Authors: 이경미 Abstract: The field of cancer treatment has seen a paradigm shift from non-specific chemotherapy to precision-targeted therapies, culminating in the advent of immune checkpoint inhibitors (ICIs). These innovative therapies activate the immune system to identify and destroy cancer cells, addressing the limitations of earlier approaches, including the toxicity of chemotherapy and the resistance associated with targeted therapies. Chemotherapy, the first generation of cancer treatment, indiscriminately targeted both healthy and cancerous cells, leading to severe side effects. Targeted therapies, the second generation, offered greater specificity by focusing on genetic mutations within cancer cells. However, their efficacy was limited to patients with certain mutations, and resistance often developed. In contrast, ICIs provide a versatile treatment applicable across diverse cancer types, offering hope for broader and more durable responses. Lung cancer, a major contributor to global cancer incidence and mortality, is broadly categorized into non-small cell lung cancer (NSCLC) and small cell lung cancer (SCLC). NSCLC, representing 80-85% of all lung cancer cases, comprises lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), and large cell carcinoma (LCC). Among these, LUAD constitutes approximately 40% of all lung cancer cases, making it the most common subtype worldwide. Immunotherapy has become a keystone of lung cancer treatment, with numerous FDA-approved ICIs targeting PD-1, PD-L1, CTLA-4 for first- or second-line treatment. Despite its significance, ICIs exhibit therapeutic efficacy in only 20-30% of patients, and appropriate biomarkers to predict treatment responses remain lacking. Lung cancer is also a highly heterogeneous malignancy, with cancer heterogeneity referring to the diversity in molecular and phenotypic characteristics observed both between tumors (intertumoral heterogeneity) and within a single tumor (intratumoral heterogeneity). Even within the same tumor tissue, cells with distinct properties coexist, and each patient exhibits unique molecular and phenotypic variations. This heterogeneity significantly affects tumor progression, treatment responses, and recurrence, making cancer treatment more challenging. Therefore, there is an increasing need to uncover the mechanisms of immune checkpoint inhibitors (ICIs) and identify predictive biomarkers for treatment responses in the context of lung cancer's high heterogeneity. This paper presents two studies that analyze transcriptomic data from LUAD patients treated with ICIs to identify predictive factors for ICI response. In the first study, we used non-negative matrix factorization (NMF) clustering analysis based on transcriptomic data to classify 387 LUAD patients treated with ICIs into six subgroups. Each subgroup was characterized based on distinct immune profiles, which were found to be associated with immunotherapy outcomes. Various immune-related scores were computed, and genes highly expressed in specific subgroups were analyzed to identify significant molecular characteristics. To validate these subgroups, we performed clustering on 516 LUAD patients from The Cancer Genome Atlas (TCGA) dataset, identifying three subgroups. We observed high similarities between corresponding subgroups in both cohorts, confirming the robustness of our classification. We also identified characteristics strongly associated with ICI response in each of the six subgroups, and validated these findings using data from another cohort of lung cancer patients treated with ICIs. Notably, we identified the subgroup with the poorest prognosis and proposed potential combination therapies tailored to this high-risk subgroup. This study underscores the significance of characterizing the distinct molecular and genetic features of heterogeneous LUAD and their potential as predictors for ICI responses. In the second study, we developed a transcriptome-based model to predict ICI responses in LUAD patients. Using data from 85 LUAD patients, we integrated tumor mutation burden (TMB) with transcriptomic data to construct a machine learning model that predicts ICI response. Patients were classified based on PD-L1 status, and TMB was identified as an effective predictor in PD-L1 negative patients. In the PD-L1 positive group, we utilized the Xgboost algorithm to build an ensemble machine learning model, which demonstrated high predictive accuracy using input variables such as gene expression, gene set scores, and immune cell composition scores. Key predictors included B and T cell related signatures, and analysis of non-responders revealed significantly elevated CTLA-4 expression, suggesting that combining anti-PD-1 and anti-CTLA-4 therapy may benefit this subgroup. This study presents a model that predicts ICI response using transcriptomic data and machine learning algorithms. The predictive factors identified by the model, along with the analysis of non-responders, provide insights that could help improve ICI outcomes for LUAD patients. These studies provide a comprehensive framework for understanding the molecular and immune landscapes of LUAD patients treated with ICIs and offer actionable insights for personalized immunotherapy strategies. By integrating multi-omics data with advanced analytical approaches, this study highlights potential avenues to overcome the limitations of current immunotherapies, ultimately contributing to improved outcomes for lung cancer patients.;면역항암제는 암을 직접 공격하는 기존의 1세대 화학항암제, 2세대 표적항암제와 달리 면역체계를 자극하여 면역세포가 선택적으로 암세포를 공격하도록 유도하는 3세대 항암제로, 기존 1세대·2세대 항암제의 여러 한계를 극복할 수 있다는 측면에서 암 치료의 새로운 패러다임으로 주목받고 있다. 초기 1세대 화학항암제는 암세포와 정상 세포를 구분 없이 공격해 각종 부작용을 초래했다. 이후 개발된 2세대 표적항암제는 암세포의 특정 유전자 돌연변이를 겨냥해 더 높은 특이성을 보였지만, 치료표적이 발현된 경우에만 효과적이며, 내성이 발생하는 한계가 있었다. 반면, 면역항암제는 다양한 암 유형에 적용 가능하고 지속 가능한 반응을 기대할 수 있는 새로운 치료 옵션으로 각광받고 있다. 폐암은 세계적으로 암 발생 수 1위, 사망원인의 1위인 암종으로, 크게 비소세포폐암 (NSCLC)와 소세포폐암 (SCLC)로 나뉜다. 비소세포폐암은 전체 폐암의 80~85%를 차지하며, 폐선암, 편평세포암, 대세포암 등이 포함되는데, 이 중 폐선암은 전체 폐암의 약 40%를 차지하며 전 세계적으로 폐암에서 가장 흔한 유형으로 알려져 있다. 폐암은 면역항암제가 가장 활발하게 적용되고 있는 암종으로, PD-1, PD-L1, CTLA-4를 타겟으로 하는 다양한 면역항암제가 폐암의 1차, 혹은 2차 치료제로 FDA의 승인을 받는 등 면역치료의 중요성이 강조되는 질환이다. 그러나 면역항암제는 20~30% 환자군에서만 치료 효능을 보이며, 치료 효능을 예측할 수 있는 적절한 바이오마커가 부재한 실정이다. 또한 폐암은 환자마다 이질성이 큰 암종으로, 암 이질성은 종양 간 또는 종양 내에서 분자적 및 표현형적 특성의 다양성을 의미한다. 동일한 암 조직 내에서도 서로 다른 특성을 지닌 세포들이 공존하며, 환자마다 고유한 분자적 및 표현형적 차이를 보인다. 이러한 이질성은 암의 진행, 치료 반응, 재발에 중요한 영향을 미치며, 암 치료를 어렵게 하는 요인 중 하나이다. 이에 따라, 폐암의 높은 이질성을 고려한 면역항암제의 기전 규명 및 치료 반응 예측 인자 발굴의 필요성이 대두된다. 본 논문에서는 면역항암제 치료를 받은 폐선암 환자의 전사체 데이터를 분석하여 면역항암제의 예측 요인을 설명하는 두가지 연구를 소개한다. 첫 번째 연구에서는 면역항암제 치료를 받은 387명의 폐선암 환자를 대상으로 전사체 데이터를 기반으로 한 NMF 클러스터링 분석을 수행하여 6개의 서브그룹으로 분류하였다. 각 서브그룹은 서로 다른 면역 프로파일을 보였으며, 이는 면역치료의 예후 차이와도 밀접하게 연관되었다. 또한, 기존에 알려진 면역치료와 상관성을 보이는 각종 지표들을 계산하고, 특정 서브그룹에서 발현이 높은 유전자들을 분석하여 각 서브타입의 유의미한 유전적·분자적 특성을 규명하였다. 이어서, 387명의 폐선암을 대상으로 분류한 서브그룹이 일반적인 폐선암의 특징을 잘 반영하는지를 평가하기 위해 TCGA의 516명의 폐선암 환자군에 동일한 클러스터링을 적용하여 3개의 서브그룹으로 분류하였다. 각 코호트의 서브그룹 간의 상관관계와 특성의 유사성을 확인한 결과 비슷한 특성을 공유하는 서브그룹이 존재하였고, 해당 서브그룹 간 높은 유사성을 확인하였다. 또한 6개의 서브그룹마다 면역치료의 반응성과 높은 상관성을 보이는 특성을 확인하였고, 면역항암제 치료를 받은 또다른 폐암 환자의 데이터를 활용하여 이를 검증하였다. 특히, 예후가 가장 좋지 않은 하위 그룹을 발견하고, 해당 그룹에 적합한 병용 치료법을 제시하였다. 이 연구는 폐선암의 이질적인 특성을 고려하여 환자별로 가지는 서로 다른 폐암의 특성을 확인하고, 각 특성마다 면역항암제의 예후 예측력이 있는 잠재적인 인자를 탐색한 데에 그 의의가 있다. 두번째 연구에서는 폐선암 환자에서 면역항암제에 대한 반응을 예측하기 위한 전사체 기반의 모델을 개발하였다. 85명의 LUAD 환자 데이터를 사용해 종양 변이 부담과 전사체 데이터를 통합하여 면역항암제에 반응할 환자들을 예측할 수 있는 기계학습 모델을 구축하였다. PD-L1을 기준으로 양성 환자군과 음성 환자군으로 분류하였고, PD-L1 음성 환자군에서는 종양 변이 부담이 면역항암제의 반응을 예측하는데 효과적인 지표로 활용될 수 있음을 확인하였다. PD-L1 양성 환자군에서는 XGBoost 알고리즘을 활용한 앙상블 기계학습 모델을 구축하였다. 기계학습은 기존에 찾아낼 수 없었던 복잡한 패턴을 빠르고 정확하게 분석할 수 있는 인공지능 기술로, 이를 활용하여 면역항암제의 반응성을 높은 정확도로 예측할 수 있었다. 이 모델은 유전자 발현, 유전자 세트 점수, 세포 구성 점수 등의 다양한 입력 변수를 사용하였고, PD-L1 양성 환자군을 대상으로 높은 예측력을 보였다. 다양한 입력 변수 중 모델에서 주로 사용된 예측 인자로는 B세포, T 세포 관련 시그니처 등이 있음을 확인하였다. 모델에서 높은 예측 점수를 보였음에도 불구하고 실제로는 면역항암제에 반응하지 않은 비반응 환자군을 분석한 결과, CTLA-4 발현이 유의미하게 높음을 발견하였다. 이는 anti-PD-1 치료와 anti-CTLA-4 병용 치료가 이러한 환자들에게 효과적일 수 있음을 시사한다. 이 연구는 전사체 데이터와 기계 학습 알고리즘을 사용하여 면역 치료에 대한 반응성을 예측할 수 있는 모델을 제시하며, 모델의 예측 인자와 비반응 환자군의 분석 결과는 폐선암 환자의 면역항암제 반응을 개선하는 데 기여할 수 있는 정보를 제공하는 데 그 의의가 있다. 위 연구 결과들은 면역치료제를 받은 폐선암 환자들의 분자적 및 면역학적 특성을 이해하는 데 중요한 기초를 제공하며, 개인화된 면역치료 전략을 수립하는 데 유용한 정보를 제시한다. 다중 오믹스 데이터를 고도화된 분석 기법과 결합하여 현재 면역치료의 한계를 극복할 수 있는 새로운 가능성을 제시하며, 궁극적으로 폐암 환자들의 치료 성과 향상에 기여할 수 있을 것으로 기대된다. Wed, 01 Jan 2025 00:00:00 GMT https://dspace.ewha.ac.kr/handle/2015.oak/270809 2025-01-01T00:00:00Z Advancements in Drug Discovery and Disease Mechanism Analysis via Extensive Datasets https://dspace.ewha.ac.kr/handle/2015.oak/268896 Title: Advancements in Drug Discovery and Disease Mechanism Analysis via Extensive Datasets Ewha Authors: 류지나 Abstract: As data volumes continue to expand, the biotechnology industry increasingly benefits from computational strategies that surpass traditional methods, enhancing efficiency and reducing costs. This approach facilitates the development of efficient treatment strategies by utilizing chemical structures and omics data. Expanding chemical space correlates with higher success rates in hit discovery, potentially accelerating new drug development. Advanced deep learning models capable of analyzing extensive compound data require refined methods for selecting essential information to enhance learning accuracy. Omics data enables the interpretation of complex mechanisms of disease and drug actions through molecular profiling, with significance verified across diverse datasets based on the integration of large-scale public data. This thesis explores the application of computational techniques and deep learning in drug discovery and disease mechanism description through three chapters, underlining the use of chemical structures and transcriptome data. The first chapter proposes the Atom-Pair Map (APM) approach, an advanced molecular representation for virtual screening. APMs are generated based on the three-dimensional structure and physicochemical features of compounds. The construction of Atom-pair neural net (APNet), a deep learning model leveraging APM as input, further solidifies APM's role in enhancing the accuracy of compound-target interaction predictions. APM's demonstrated superiority across various datasets highlights its potential as an alternative to conventional molecular representations and its ability to improve predictive modeling in drug discovery. The second chapter investigates the roles of OSCAR and 5-ASA in osteoarthritis (OA), offering comprehensive insights into their roles in disease pathology and therapeutic potential. Comparative transcriptome analyses revealed significant shifts in gene expression profiles following OSCAR over-expression, with notable reversals upon 5-ASA treatment. The identification of Flip-DEGs, genes reversed by 5-ASA, highlights the compound's ability to modulate critical pathways implicated in OA. The dual role of 5-ASA in counteracting OSCAR-induced OA underscores its potent anti-inflammatory action and restoration of cartilage homeostasis, as analyzed through downstream genes of transcription factors and drug-induced transcriptomes. The third chapter applies non-negative matrix factorization (NMF) to the transcriptome data of patients with non-alcoholic fatty liver disease for clustering. This analysis identified distinct clusters, each representing unique transcriptomic signatures suggesting different underlying mechanisms and potential disease outcomes. This analysis outlines the complex pathophysiological mechanisms of the disease—such as hepatitis, liver fibrosis, and metabolic imbalance—cluster by cluster and validates derived transcriptome signatures by comparing them with established disease models to assess their clinical applicability. In summary, this thesis presents advancements in computational drug discovery and disease mechanism interpretation through systematic analysis of large-scale data. It proposes a novel molecular representation method based on the physicochemical properties of compounds and underscores the utility of large-scale data in developing treatment strategies for complex chronic diseases.;기술의 발전을 통해 축적되어 온 데이터는 질적, 양적으로 팽창하며 생물학 분석의 근간이 되고 있다. 이를 통해 질환과 화합물의 복합적인 기전을 인실리코 수준에서 밝혀 높은 효율과 적은 비용으로 효과적인 치료 전략을 세울 수 있다. 시스템적 분석에 사용되는 데이터는 크게 화합물의 구조활성 데이터와 오믹스 데이터가 있다. 화합물의 경우, 분석 대상의 화합물 공간 크기가 확장될수록 히트 발굴 성공률이 높으며, 신약 개발과정을 획기적으로 단축할 수 있다. 방대한 화합물 데이터를 효율적으로 분석하고자 딥러닝 모델들이 개발되어 왔지만 학습의 정확도를 높이기 위해서는 데이터로부터 필요한 정보를 선별할 수 있는 방법이 필요하다. 오믹스의 경우, 공개 데이터 기반의 분자 프로파일링을 통해 질환과 약물의 복합적인 기전 해석이 가능하다. 다양한 출처와 종류의 데이터셋을 활용함으로써 일관적인 검증과 유의미한 수준의 분석을 할 수 있다. 본 논문은 세 장에 걸쳐 시스템적 분석과 딥러닝 적용을 통한 약물과 질환 기전 규명 방법을 서술하였으며, 이 과정에서 화합물 구조와 전사체 데이터를 활용했다. 첫 번째 장은 약물 가상 탐색을 위한 화합물의 원자쌍 기반 분자 표현법을 다룬다. 원자쌍 기반 분자 표현법은 화합물의 삼차원 입체구조와 물리화학적 특성을 기반으로 생성되는 분자 표현법으로, 타겟 단백질과의 결합에 직간접적인 영향을 가지는 화합물의 특성을 반영한다. 또한 이를 기반으로 하는 딥러닝 모델의 구축을 통해 원자쌍 기반 분자 표현법이 화합물-타겟 단백질 간 상호작용 예측 정확도를 향상시킬 수 있음을 확인했다. 원자쌍 기반 분자 표현법은 다양한 데이터셋에서 높은 성능을 보이며 기존 분자 표현법의 대안으로서 신약 개발 예측 모델링에 기여할 수 있다. 두 번째 장은 만성 질환인 골관절염에서 OSCAR 유전자와 5-ASA의 역할을 분석하여 병리적 해석과 치료 전략을 제시한다. 연골 세포에서 OSCAR 유전자를 과발현시킨 후 5-ASA를 처리한 전사체의 비교 분석을 통해 유전자 발현의 주요 변화를 밝혀냈다. 특히 5-ASA에 의해 회복되는 유전자인 역-차등발현 유전자를 선별하여 골관절염을 완화하는 기전을 발견했으며, 전사 인자 및 약물 유도 전사체 데이터를 기반으로 5-ASA의 항염증 작용과 연골 항상성 회복 작용에 관여하는 전사인자와 약물 타겟을 제안했다. 세 번째 장에서는 비알콜성 지방간 환자의 전사체에 비음수 행렬 분해 기법을 적용하여 질환 모델의 군집을 분류한다. 이를 통해 고유한 전사체 패턴을 가지는 개별 군집을 확보하였으며 군집 간 서로 다른 기전과 질환 예후를 제시했다. 간염 및 간섬유화, 그리고 대사 불균형과 같이 복합적으로 나타나는 비알콜성 지방간의 기전을 군집 별로 분리하여 분석했으며 도출한 전사체 패턴을 공개된 질환 모델과 비교하여 임상에서의 보편적 적용 가능성을 검증했다. 결론적으로, 본 논문은 대규모 데이터를 기반으로 시스템적 분석을 통한 약물 가상 탐색과 질환 기전의 해석을 제시한다. 화합물의 물리화학적 이해를 기반으로 한 분자 표현법을 제안했으며 복합적인 만성 질환에서의 치료 전략 제시에 있어서 대규모 오믹스 데이터의 활용성을 시사했다. Mon, 01 Jan 2024 00:00:00 GMT https://dspace.ewha.ac.kr/handle/2015.oak/268896 2024-01-01T00:00:00Z Prediction of drug effects based on graph convolution network models https://dspace.ewha.ac.kr/handle/2015.oak/266767 Title: Prediction of drug effects based on graph convolution network models Ewha Authors: 한지연 Abstract: Computational approaches play important roles in all stages of drug development. However, predicting drug effects is still extremely difficult due to several reasons such as i) target genes are unclear for many traditional drugs, ii) off-target bindings are frequently observed, iii) many target genes are connected to complex biological networks that could lead to unexpected side effects, and so on. Nevertheless, advances in computational methods and pharmacogenomic technologies have made it possible to predict the drug effects reliably enough to be useful in pharmacological industry. Biological networks that encompass protein-protein interactions, drug-target bindings, drug-disease associations, and gene regulations by proteins, have been the popular subject to be mined to identify novel drug targets as well as drug-disease associations. Recent introduction of deep learning methods has brought revolutionary changes in computational drug discovery. Another key aspect is the rapid expansion of pharmacogenomic data which could be mined with state-of-the-art machine learning methods. Thus, the computational predictive modeling based on machine learning algorithms has emerged as the mainstream of forecasting drug effects. Diverse data modalities, including chemical structure, target gene/protein properties, and drug-target interactions, and experimental parameters (in vitro, in vivo, clinical), have been actively explored to infer novel drug targets, efficacies, and side effects. This thesis focuses on predicting two types of drug effects, namely the drug synergy and toxicity, utilizing various deep learning algorithms and feature materials such as chemical structure, genetic interactions, and pharmacogenomic data. The first part scrutinizes the synergistic effect of drug combinations, a concept gaining prominence due to the protracted drug development timelines. Since testing efficacies of all conceivable drug pairs practically impossible due to time and budget limits, computational methods are attaining their significance by presenting promising candidates with enhanced accuracy. Moreover, given the variability of drug effects according to cell and tissue types, contextualizing the synergistic effects at the cellular level is critical. This study introduces a composite deep-learning-based model, Drug Synergy Prediction by Integrated GCN (DRSPRING), which forecasts drug synergy effects leveraging pharmacogenomic profiles and molecular properties. The primary algorithm employed is the graph convolutional network (GCN), which is proficient at effectively representing the chemical structure as well as the genetic networks as graphs, i.e. the collection of nodes and edges. The key aspect is to overcome the data scarcity problem since the pharmacogenomic profiles of the LINCS project cover only limited portion of chemical space. This study augments the drug-induced gene expression data with the predicted profiles. Thus, DRSPRING consist of two computational modules – Module 1 (M1) predicting the drug-induced gene expression profiles and module 2 (M2) predicting the drug synergy scores, both adopting GCN-based deep modeling methods. The Loewe scores from the DrugComb database were trained as the synergy effect of drug pairs. Importantly, DRSPRING can be applied to any drug pairs and cell lines as long as the basal gene expression data is available. Practical applicability and reliability of DRSPRING have been demonstrated by exploring combinations of NCI-approved cancer drugs in breast and lung cancer. The second part focuses on drug-induced liver toxicity as a representative of adverse drug effects. Liver toxicity is one of the most common reasons of failure in drug development because liver is the primary organ for detoxification in drug metabolism. Despite its importance in drug development, accurate prediction of liver toxicity is hampered due to i) liver toxicity information being scattered in many resources, ii) no standards of curation to describe liver toxicity, and iii) different levels of experimental data and annotations, namely cell assays, in vivo toxicity in animal models, and clinical findings in humans. Diverse molecular mechanisms of liver toxicity contribute to the difficulties in prediction. In this study, hepato-toxicity information is aggregated from nine related databases and systematically categorized into three classes of in vitro, in vivo, and clinical based on experimental setting. The MedDRA terms are used for standardized annotation. Subsequently, a GCN-based prediction model is developed to predict the hepatotoxicity for novel drugs, utilizing the compiled knowledgebase as the training data. Then a hepato-toxicity database (HTP-DB) was constructed to provide the prediction score as well as the curated information. HTP-DB consists of two integral components: the hepatotoxicity reference knowledge base (HTP-KB) and the liver toxicity prediction module (HTP-Pred). Facilitated through a user-friendly web interface, HTP-DB offers a comprehensive information on drug-induced hepatotoxicity, which should be valuable for drug development. In summary, this thesis navigates the complexities of drug effects, specifically investigating drug pair synergies and side effects. Both chapters utilize predictive models based on GCN, with consideration for the contextual nuances of biological networks, thereby propelling the convergence of cheminformatics and bioinformatics.;약물 개발의 모든 단계에서 컴퓨터를 이용한 분석은 중요한 역할을 한다. 그러나 i) 시판되는 약물의 표적이 되는 유전자가 불분명하거나, ii) 표적 외의 대상과 약물 상호 작용이 관찰되거나, iii) 약물 표적 유전자가 복잡한 생물학적 네트워크에 연결되는 등, 예상치 못한 부작용을 초래할 수 있는 여러 가지 이유로 인해 약물 효과를 정확히 예측하는 것은 여전히 매우 어려운 실정이다. 그럼에도 불구하고 다양한 계산 방법의 발전과 약리유전체 데이터의 사용은 약리 산업에서 좀 더 안정적으로 약물 효과를 예측할 수 있도록 도왔다. 특히 단백질과 단백질 간의 상호 작용, 약물-표적 결합, 약물-질병 연관성 및 단백질에 의한 유전자 조절 등을 포함하는 생물학적 네트워크를 활용하는 것은 직접적인 약물과 질병의 연관성을 규명하는 것뿐만 아니라 새로운 약물 표적을 제시하는 과제에서도 주목받고 있다. 또한, 최근 딥러닝 방법의 도입은 계산 모델을 통한 약물 개발에 혁명적인 변화를 가져왔으며, 그 과정에 약리유전체 데이터의 급속한 확장 또한 훈련 데이터로서 도움을 주었다. 따라서, 기계 학습 알고리즘을 기반으로 한 계산 예측 모델링이 약물 효과 예측의 주류로 부상하였다. 더 나아가 새로운 약물 표적, 효능 및 부작용을 추론하기 위해 화학 구조, 표적 유전자/단백질 특성 및 약물-표적 상호 작용을 포함한 다양한 데이터 종류와 실험 환경 (시험관 내, 생체 내, 임상) 정보가 활용되고 있다. 본 논문은 다양한 딥러닝 알고리즘과 화학 구조, 유전자 상호 작용 및 약리유전체 자료와 같은 생물-화학적인 데이터를 활용하여 약물 시너지 와 독성이라는 두 가지 유형의 약물 효과를 예측하는 데 중점을 둔다. 첫 번째 챕터에서는 약물 개발 일정이 길어지면서 주목받는 개념인 약물 조합의 시너지 효과를 예측한다. 실험적으로 모든 가능한 약물 쌍에 대해 시너지 실험을 수행하는 것은 물적-시간적 자원 측면에서 비현실적이며, 이를 대신하기 위한 계산 모델을 통해 정확도 높은 유망한 후보를 제시하는 방법이 대두되고 있다. 또한 세포/조직 별로 기저에 있는 생물학적 네트워크의 다양성을 고려하여 세포 수준에서의 약물 시너지 효과를 예측하는 것이 중요하다. 본 연구에서는 약물 유전체 프로파일과 약물 구조를 활용하여 시너지 효과를 예측하는 DRSPRING(Drug Synergy Prediction by Integrated GCN)을 소개한다. 사용되는 주요 알고리즘은 그래프 컨볼루션 네트워크 (GCN)로, 약물 구조와 유전자 네트워크를 효과적으로 표현한다는 장점이 있다. 또한, 이 과정에서 LINCS 프로젝트에서 제공하는 약물 유전체 프로파일의 데이터 부족을 해결하기 위해, 추가적인 약물 유래 유전자 발현 값을 생산하여 데이터를 증강하고, DRSPRING의 훈련을 용이하게 하였다. 따라서 DRSPRING은 약물 유래 유전자 발현 프로파일을 예측하는 모듈 1(M1)과 약물 시너지 점수를 예측하는 모듈 2(M2)의 두 가지 계산 모듈로 구성되며, 둘 다 GCN 알고리즘을 활용하였다. 또한, 훈련 데이터는 DrugComb 데이터베이스의 Loewe 점수를 사용하였다. DRSPING은 기본 유전자 발현 데이터가 존재하는 모든 약물 쌍과 세포주에 적용될 수 있으며, 유방암과 폐암에서 NCI 승인 암 약물의 조합을 탐색함으로써 실제 적용 가능성과 신뢰성을 입증하였다. 두 번째 장에서는 약물 부작용의 대표적인 예로서 약물 유발 간 독성에 초점을 맞춘다. 간은 약물 대사에서 해독을 위한 주요 기관이며, 간 독성은 약물 개발 단계를 좌우하는 중요한 문제이자 관문이다. 이러한 중요성에도 불구하고, i) 간 독성 정보가 다양한 곳에 흩어져 있고, ii) 간 독성을 설명하기 위한 큐레이션 표준이 마련되어 있지 않으며, iii) 세포 분석, 동물 모델, 임상데이터 등 다양한 수준의 실험 데이터에 대한 각기 다른 결과 해석으로 인해 간 독성에 대한 정확한 예측은 어려운 실정이다. 더 나아가 간 독성의 다양한 분자 메커니즘도 예측의 어려움에 기여하는 요인이다. 본 연구에서는 간 독성 정보를 9개의 관련 데이터베이스에서 수집하고 실험 환경 설정에 따라 시험관 내, 생체 내, 임상의 세 가지 클래스로 체계적으로 분류하였다. 또한 MedDRA 용어를 표준화된 주석으로 사용하였다. 그 다음으로, 축적된 데이터를 훈련 데이터로 학습하고, 새로운 약물에 대한 간독성을 예측할 수 있는 GCN 기반 예측 모델을 개발하였다. 큐레이션된 정보와 개발된 예측 모델을 다양한 사용자들에게 효율적으로 제공하기 위해 간독성 데이터베이스 (HTP-DB)를 구축하였으며, HTP-DB는 간독성 지식베이스 (HTP-KB)와 간 독성 예측 모듈 (HTP-Pred)의 두 가지 요소로 구성되어 있다. 사용자 친화적인 웹 인터페이스를 가진 HTP-DB 웹서버는 약물 유발 간독성에 대한 포괄적인 정보를 제공하며, 이는 약물 개발에 기여할 것으로 기대한다. 본 논문은 이와 같이 약물 효과의 복잡성을 탐색하고, 병용 약물 시너지 효과와 약물 부작용을 연구하였다. 두 장 모두 세포주/실험 대상 별 생물학적 네트워크의 맥락을 고려하였고, 이 과정에서 화합물 구조 및 생물학적 네트워크 구조의 효율적인 활용을 위해 GCN 기반 예측 모델을 적극 활용하였으며, 그 효과를 검증하였다. Mon, 01 Jan 2024 00:00:00 GMT https://dspace.ewha.ac.kr/handle/2015.oak/266767 2024-01-01T00:00:00Z Computational drug discovery using drug-induced transcriptome data https://dspace.ewha.ac.kr/handle/2015.oak/264508 Title: Computational drug discovery using drug-induced transcriptome data Ewha Authors: 이한비 Abstract: About ~95% of drug candidates fail during pre-clinical and clinical trials despite huge cost and time. It is critical to fully understand a drug’s mechanism of action (MoA) to choose the right candidates for clinical development during the early stage of drug discovery, i.e. hit identification, hit-to-lead, and lead optimization, etc. Omics-based molecular profiling allows a comprehensive understanding of a drug’s MoA as it did in many studies of genetic perturbation, disease subtypes, cell differentiation, etc. Particularly, computational analysis of drug-induced transcriptome has been applied to various problems of early drug discovery including drug-target interaction (DTI), drug repositioning (DR), side effect/toxicity prediction, and elucidation of unknown off-targets. All these approaches contribute to selecting more promising candidates of desired properties throughout the drug discovery stages. In this thesis, I applied computational analyses to i) elucidate transcriptome-wide MoA of a drug candidate, ii) identify DR candidates for non-alcoholic steatohepatitis (NASH), and iii) predict DTI using deep neural network (DNN). In chapter I, I analyzed both bulk- and single-cell RNA (scRNA) transcriptome of asthma mouse models treated by a novel phosphoinositide 3-kinase delta (PI3Kδ) inhibitor. This led to the elucidation of therapeutic mechanisms distinct from dexamethasone, a corticosteroid conventionally used to treat asthma. In addition, receptor-ligand analysis of scRNA-sequencing data predicted unique cell-cell interactions. The PI3Kδ inhibitor modulates pathways related to airway remodeling, type 2 inflammation and Th17 immune response. These results suggest that the PI3Kδ inhibitor may be effective in treating corticosteroid-resistant asthma. In chapter II, I predicted DR candidates for NASH by comparing drug-induced and disease expression profiles. This approach is called ‘CMAP analysis’, which searches for drugs that show an inverse transcriptomic pattern of disease. CMAP analysis is based on the hypothesis that drugs reversing disease transcriptomic pattern are likely to drive a cell or tissue from a disease state closer to a normal state. I compared expression profiles of NASH mouse models (11 sets, 70 profiles) and patients (9 sets, 285 profiles). Using CMAP analysis, I listed the top 9 candidates, seven of which inhibited lipid accumulation of AML12 cells. These seven candidates included four MEK inhibitors, suggesting MEK as a promising target in treating NASH. In chapter III, I developed a novel DNN model to predict DTI based on large-scale drug-induced expression profiles. I evaluated three different target features for DTI prediction: pathway memberships, protein-protein interaction (PPI) networks, and genetically perturbed expression profiles (gene knockdown). Additionally, the DNN model was compared with other machine learning models, i.e. naïve Bayes, logistic regression, and random forest. The DNN model shows more robust and better performance (AUC 0.85) than other machine learning models (AUC 0.66~0.76). In conclusion, I introduced three approaches for computational drug discovery using drug-induced transcriptome data. I showed that utilizing transcriptomics technology can provide a new perspective in addressing common challenges in the early stages of drug development. The approaches introduced in this thesis provide opportunities for a more efficient drug discovery process. ;신약 개발 과정은 많은 시간과 비용이 소모되지만 95%의 후보 물질은 임상과 전임상 단계에서 실패한다. 적합한 후보 물질을 도출하기 위해서는 신약 개발 초기 단계에서 후보 물질의 작용 기전에 대한 충분한 이해가 필요하다. 오믹스 기반의 분자 프로파일링은 생물학적 기전에 대한 포괄적인 이해를 돕기 때문에 과거부터 질병 아형 분류, 세포 분화 연구 뿐만 아니라 약물의 작용 기전 해석에도 사용되어 왔다. 특히, 약물 유도 전사체를 활용한 컴퓨터 분석은 약물 개발 초기 단계에 적용되어 약물-표적 단백질 상호작용 예측, 신약 재창출 및 약물 부작용 예측과 같은 문제를 해결하기 위해 사용되고 있다. 이러한 접근법들은 적합한 효능의 후보 약물을 도출하여 성공적인 신약 개발로 이어지는데 도움을 주고 있다. 본 논문은 약물 유도 전사체를 활용한 컴퓨터 기반 신약 개발의 세 가지 방법을 제시한다. 첫 번째로 본 논문은 벌크 전사체 시퀀싱 (bulk RNA sequencing)과 단일 세포 전사체 시퀀싱 (single-cell RNA sequencing) 데이터의 통합 분석을 통하여 새로 개발된 PI3Kδ 억제제의 중증 천식에서의 작용 기전을 탐구했다. 공개 데이터베이스로부터 천식 모델 전사체 데이터를 수집하고 중증 천식 모델의 전사체와 비교하여 중증 천식 특이적 기전을 규명하였다. 또한 천식에서 주 치료제로 사용되는 덱사메타손과 PI3Kδ 억제제가 처리된 전사체 데이터를 비교 분석하여 PI3Kδ 억제제 특이적 치료 기전을 추론하였다. 본 연구를 통해 새로 개발된 PI3Kδ 억제제가 제2형 사이토킨 뿐만 아니라 중증 천식과 관련 있는 airway remodeling 경로와 IL-17면역 반응을 조절하여 중증 천식을 호전시킨다는 것을 제시하였다. 두 번째로, 약물 재창출 방법을 통해 비알코올성 지방간염을 치료할 수 있는 후보 약물을 도출하였다. 대규모 약물 유도 전사체 데이터베이스로부터 수집한 약물의 시그니처와 비알코올성 지방간염의 시그니처를 비교하여 질병에서 유도된 유전자 발현 변화를 역접 시킬 수 있는 후보 약물을 예측했다. 또한 질병의 발현변화를 유도할 수 있는 타겟을 예측하여 MEK 억제제를 비알코올성 지방간염을 치료할 수 있는 후보 약물로 도출하였다. 세포 실험을 통하여 4종의 MEK 억제제들의 지방 축적 억제 효능과 간 성상 세포 활성 억제 효능을 확인하였다. 마지막으로 본 논문은 심층 신경망 알고리즘을 대규모 약물 유도 전사체에 적용하여 약물-표적 단백질 상호작용을 예측하는 새로운 심층 신경망 모델을 제시했다. 세 가지 다른 성질의 데이터로부터 표적 단백질의 특성을 도출하고 약물 유도 전사체와 함께 학습시켜 성능을 비교했다. 가장 우수한 성능을 보이는 모델을 최종 모델로 선정하였으며 다른 기계 학습 모델들보다 더 안정적이고 좋은 성능을 보였다 (AUROC 0.85). 본 논문은 약물 유도 전사체를 활용한 컴퓨터 기반의 신약 개발 방법을 소개하였다. 약물 유도 전사체를 활용하여 신약의 작용 기전을 추론하고 신약 재창출을 통해 비알콜성 지방간염을 치료할 수 있는 후보 물질을 도출하였다. 또한 심층 신경망 모델을 구축하여 약물-표적 단백질의 상호작용을 예측하였다. 세 가지 문제 해결을 통해 약물 유도 전사체를 활용한 컴퓨터 기반 접근법의 유용성을 확인하였다. 결론적으로, 본 논문은 약물 유도 전사체가 약물 개발 초기 단계의 문제를 해결하는 데에 새로운 관점을 제시하여 약물 개발의 효율을 높이는 기회를 제공할 수 있음을 확인하였다. Sun, 01 Jan 2023 00:00:00 GMT https://dspace.ewha.ac.kr/handle/2015.oak/264508 2023-01-01T00:00:00Z