DSpace at EWHA: In silico prediction of the full United Nations Globally Harmonized System eye irritation categories of liquid chemicals by IATA-like bottom-up approach of random forest method

Browse

My Repository

View : 397 Download: 0

In silico prediction of the full United Nations Globally Harmonized System eye irritation categories of liquid chemicals by IATA-like bottom-up approach of random forest method

Title: In silico prediction of the full United Nations Globally Harmonized System eye irritation categories of liquid chemicals by IATA-like bottom-up approach of random forest method

Authors: Kang, Yeonsoo; Jeong, Boram; Lim, Doo-Hyeon; Lee, Donghwan; Lim, Kyung-Min

Ewha Authors: 임경민; 이동환

SCOPUS Author ID: 임경민; 이동환

Issue Date: 2021

Journal Title: JOURNAL OF TOXICOLOGY AND ENVIRONMENTAL HEALTH-PART A-CURRENT ISSUES

ISSN: 1528-7394

1087-2620

Citation: JOURNAL OF TOXICOLOGY AND ENVIRONMENTAL HEALTH-PART A-CURRENT ISSUES vol. 84, no. 23, pp. 960 - 972

Keywords: Eye irritation potential; machine-learning; physicochemical descriptor; random forest; in silico

Publisher: TAYLOR &

FRANCIS INC

Indexed: SCIE; SCOPUS

Document Type: Article

Abstract: As an alternative to in vivo Draize rabbit eye irritation test, this study aimed to construct an in silico model to predict the complete United Nations (UN) Globally Harmonized System (GHS) for classification and labeling of chemicals for eye irritation category [eye damage (Category 1), irritating to eye (Category 2) and nonirritating (No category)] of liquid chemicals with Integrated approaches to testing and assessment (IATA)-like two-stage random forest approach. Liquid chemicals (n = 219) with 34 physicochemical descriptors and quality in vivo data were collected with no missing values. Seven machine learning algorithms (Naive Bayes, Logistic Regression, First Large Margin, Neural Net, Random Forest (RF), Gradient Boosted Tree, and Support Vector Machine) were examined for the ternary categorization of eye irritation potential at a single run through 10-fold cross-validation. RF, which performed best, was further improved by applying the 'Bottom-up approach' concept of IATA, namely, separating No category first, and discriminating Category 1 from 2, thereafter. The best performing training dataset achieved an overall accuracy of 73% and the correct prediction for Category 1, 2, and No category was 80%, 50%, and 77%, respectively for the test dataset. This prediction model was further validated with an external dataset of 28 chemicals, for which an overall accuracy of 71% was achieved.