DSpace at EWHA: Computational Models for Document Classification of Literatures in Mathematical Reviews Database

Browse

My Repository

View : 587 Download: 0

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	이준엽	-
dc.contributor.author	장희정	-
dc.creator	장희정	-
dc.date.accessioned	2021-07-28T16:30:17Z	-
dc.date.available	2021-07-28T16:30:17Z	-
dc.date.issued	2021	-
dc.identifier.other	OAK-000000180747	-
dc.identifier.uri	http://dcollection.ewha.ac.kr/common/orgView/000000180747	en_US
dc.identifier.uri	https://dspace.ewha.ac.kr/handle/2015.oak/257914	-
dc.description.abstract	As more and more papers are published these days, it is hard to classify them all manually. This thesis provides an automatic classifier for mathematical literature that uses the title and Mathematics Subject Classification (MSC) codes of the document as input and output, respectively. For our work, we build a new dataset of mathematical literature labeled with 63 mathematics subjects. In this thesis, we address two main challenges: text mining and document classification. The challenge in text mining is to propose a global lexicon of mathematical literature by unifying words and removing meaningless words. The challenge in document classification is to compare three models, such as a linear model, simple deep learning model, and multi-label classification model.;논문이 점점 더 많이 출판되면서 모두 수동으로 분류하기가 어렵다. 본 학위 논문은 문서의 제목과 수학 주제 분류 코드를 각각 입출력으로 사용하는 수학 문헌을 위한 자동 분류기를 제공한다. 우리는 연구를 위해 63개의 수학 주제를 라벨로 가진 새로운 수학 문서 데이터 집합을 구축한다. 본 논문에서는 텍스트 마이닝과 문서 분류라는 두 가지 과제를 다룬다. 텍스 트 마이닝 과제는 단어를 통일시키고 의미 없는 단어는 제거하여 수학 문헌에 대한 사전을 제안한다. 문서 분류 과제는 선형 모델, 단순 딥러닝 모델 및 다중 레이블 분류 모델과 같은 세 가지 모델을 비교한다.	-
dc.description.tableofcontents	1 Introduction 1 2 Dataset and Problem Description 4 2.1 Dataset Description 4 2.2 Problem Description 7 3 Document Classification Models 8 3.1 Linear Model 8 3.2 Fully Connected Neural Network Model 11 3.3 Multi-label Classification Model 12 4 Implementation Issues 15 4.1 Text Preprocessing 16 4.1.1 Text Cleaning 16 4.1.2 Text Normalization 17 4.2 English Document Pick-up 17 4.3 Text Mining Process 22 4.3.1 Stopword Removal 22 4.3.2 Create Global Lexicon 23 4.3.3 Bag of Words (BoW) 27 4.4 Input for Model Train 27 5 Experimental Results 29 5.1 Linear Model (Model1) 30 5.2 Fully Connected Neural Network Model (Model 2) 30 5.3 Multi-label Classification Model (Model3) 31 5.4 Comparison between Models 32 6 Conclusions and Future Work 34	-
dc.format	application/pdf	-
dc.format.extent	10810004 bytes	-
dc.language	eng	-
dc.publisher	이화여자대학교 대학원	-
dc.subject.ddc	500	-
dc.title	Computational Models for Document Classification of Literatures in Mathematical Reviews Database	-
dc.type	Master's Thesis	-
dc.creator.othername	Jang, Hee jung	-
dc.format.page	iii, 37 p.	-
dc.identifier.thesisdegree	Master	-
dc.identifier.major	대학원 수학과	-
dc.date.awarded	2021. 8	-