DSpace at EWHA: 대규모 약물 유도 전사체 데이터 기반 약물 표적 탐색

Browse

My Repository

DSpace at EWHA일반대학원 바이오정보학협동과정 Theses_Master

View : 1308 Download: 0

대규모 약물 유도 전사체 데이터 기반 약물 표적 탐색

Title: 대규모 약물 유도 전사체 데이터 기반 약물 표적 탐색

Authors: 정혜수

Issue Date: 2019

Department/Major: 대학원 바이오정보학협동과정

Publisher: 이화여자대학교 대학원

Degree: Master

Advisors: 김완규

Abstract: 신약 재창출 과정에서 약물이 어떤 표적 단백질과 상호작용을 하는 지 아는 것은 매우 중요하다. 본 연구는 대규모 약물 유도 전사체 데이터 Connectivity Map (CMap) 과 공개 데이터베이스로부터 수집한 약물 표적 정보를 활용하여 protein-protein interaction (PPI) 네트워크와 기계학습, 두 가지 in silico 방법으로 약물 표적을 예측하였다. PPI 네트워크를 사용한 예측 방법에서는, 네트워크 상에서 약물의 표적이 차등 발현 유전자에 상대적으로 가깝게 위치한다는 가정 하에 기존에 알려진 네 가지 네트워크 지표 및 알고리즘을 사용하여 약물 표적을 예측하였다. 그 결과 모든 방법이 AUROC의 중간값 0.7 이상의 정확도로 약물 표적을 예측하였으나 나타났으나, AUPR의 중간값은 0.01 이하로 낮아 거짓 양성 (False positive)인 예측의 비율이 높다고 추측되었다. 기계 학습 모델을 통한 약물 표적 예측 방법은 심층 신경망 모델을 사용하였다. 우선 심층 신경망 모델 구성을 위해 그리드 탐색을 사용하여 최적의 하이퍼파라미터 조합을 가지는 모델을 구하였다. 앞에서 사용한 네트워크 방법과 심층 신경망 모델 간의 약물 표적 예측 정도를 비교하였을 때, 심층 신경망 모델은 네트워크 방법보다 월등히 높은 AUROC (중간값 0.97)을 보였다. AUPR도 중간값 0.026으로 네트워크 방법보다는 높았으나 통상적인 기준에 비해 낮은 값이므로 많은 거짓 양성이 포함되었다고 판단되었다. 또 다른 기계 학습 모델인 랜덤 포레스트와도 예측 결과를 비교하였는데, 랜덤 포레스트는 각각 AUROC 0.96, AUPR 0.021을 기록하여 전체적으로 심층 신경망이 약간 우세하였다. 마지막으로 심층 신경망 결과에 많은 거짓 양성을 포함하고 있는 원인에 대해 조사하였으며, 그 결과 클래스 간 데이터 불균형이 영향을 준다는 것을 알아냈다.;Identification of drug targets plays an important role in drug repositioning process. This study proposed two in silico drug target prediction methods using Connectivity Map (CMap), a large-scale drug induced transcriptome database, and known drug target information from open resources. The first study is based on a protein-protein interaction (PPI) network. Drug targets were predicted by four different network measurements or algorithms under the assumption that drug targets could be neighbor of differentially expressed gene (DEG) group on the network. As a result, each approach predicted known targets accurately (median AUROC > 0.7), but the prediction includes high false positive rate (median PRAUC < 0.01). Another method is drug target prediction model using deep neural network (DNN), one of the promising machine learning methods currently. In this work, a model was optimized by grid search. Then, the performance was estimated between DNN model and the other methods. Contrary to network approaches described above, DNN model was far superior in all evaluation metrics (median AUROC ~ 0.97, median PRAUC ~ 0.023). In addition to comparison with Random forest model, DNN model performed slightly better. However, wrong predictions remained according to low median AUPR, which could result from data imbalance.