DSpace at EWHA: Recent Deep Learning Methods for Tabular Data

Browse

My Repository

DSpace at EWHA일반대학원 통계학과 Theses_Master

View : 447 Download: 0

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	송종우	-
dc.contributor.author	황예진	-
dc.creator	황예진	-
dc.date.accessioned	2023-02-24T16:30:57Z	-
dc.date.available	2023-02-24T16:30:57Z	-
dc.date.issued	2023	-
dc.identifier.other	OAK-000000201885	-
dc.identifier.uri	https://dcollection.ewha.ac.kr/common/orgView/000000201885	en_US
dc.identifier.uri	https://dspace.ewha.ac.kr/handle/2015.oak/264324	-
dc.description.abstract	Deep learning has made great strides in the field of unstructured data such as text, images, and audio. However, in the case of tabular data analysis, machine learning algorithms such as ensemble methods are still better than deep learning. To keep up with the performance of machine learning algorithms with good predictive power, several deep learning methods for tabular data have been proposed recently. In this paper, we review the latest deep learning models for tabular data and compare the performances of these models using several datasets. In addition, we also compare the latest boosting methods to these deep learning methods and suggest the guidelines to the users, who analyze tabular datasets.;딥러닝은 텍스트나 이미지 데이터와 같은 비정형 데이터 분야에서 많은 발전을 이루고 있으나, 행과 열로 이루어진 정형 데이터의 분야에서는 아직까지 비정형데이터 분야만큼의 두각을 드러내지 않고 있다. 현재 정형 데이터 예측에서는 회귀 문제와 분류 문제 모두 XGBoost, CatBoost, LightGBM과 같은 그래디언트 부스팅 의사결정 나무 기반의 앙상블 모형이 주로 좋은 성능을 보인다. 이러한 머신러닝 알고리즘의 성능을 뛰어넘기 위해, 최근 딥러닝을 바탕으로 한 정형 데이터 분석 방법론에 대한 논문들이 제안되고 있다. 본 논문에서는 정형 데이터 예측을 위해 제안된 5가지의 딥러닝 방법론(TabNet, NODE, GrowNet, AutoInt, SAINT)의 원리를 설명한다. 이후 해당 모델들을 10개의 실제 데이터셋에 적용하여 예측 성능을 비교해본다. 또한 같은 데이터셋에 대해 머신러닝의 부스팅 알고리즘을 이용한 결과를 함께 확인하여, 딥러닝 모델과 부스팅 모델의 성능을 비교한다.	-
dc.description.tableofcontents	Ⅰ. Introduction 1 Ⅱ. Deep Learning Methods for Tabular Data 3 A. TabNet 3 B. NODE 4 C. GrowNet 5 D. AutoInt 6 E. SAINT 7 Ⅲ. Boosting Methods 10 A. XGBoost 10 B. CatBoost 10 C. LightGBM 11 Ⅳ. Performance Comparison 12 A. Datasets 12 B. Preprocessing 14 C. Results 15 1. Regression Problems 15 2. Classification Problems 17 Ⅴ. Conclusion 19 Bibliography 20 Abstract (in Korean) 23	-
dc.format	application/pdf	-
dc.format.extent	734867 bytes	-
dc.language	eng	-
dc.publisher	이화여자대학교 대학원	-
dc.subject.ddc	500	-
dc.title	Recent Deep Learning Methods for Tabular Data	-
dc.type	Master's Thesis	-
dc.format.page	iv, 23 p.	-
dc.identifier.thesisdegree	Master	-
dc.identifier.major	대학원 통계학과	-
dc.date.awarded	2023. 2	-