DSpace at EWHA: Korean BERT Ensemble for Sentiment Analysis

Browse

My Repository

DSpace at EWHA일반대학원 빅데이터분석학협동과정 Theses_Master

View : 824 Download: 0

Korean BERT Ensemble for Sentiment Analysis

Title: Korean BERT Ensemble for Sentiment Analysis

Authors: 이예진

Issue Date: 2022

Department/Major: 대학원 빅데이터분석학협동과정

Publisher: 이화여자대학교 대학원

Degree: Master

Advisors: 신경식

Abstract: 정보통신기술의 발달로 인해 사람들은 온라인 상에서 활발한 상호 교류와 소통을 할 수 있게 되었다. 사람들은 인터넷 상에서 뉴스 기사 내용에 대해 자신의 정치적인 의견을 표출하기도 하고, 특정 제품이나 서비스에 대한 리뷰도 작성하며 많은 이들과 상호작용하는데, 이렇게 생성되는 방대한 양의 비정형 텍스트 데이터는 점차 자산으로서의 인식이 증가하게 되었다. 비정형 데이터를 수집하고 분석하여 의사결정에 도움이 되는 인사이트를 추출하고자 하는 텍스트 마이닝(Text Mining)이 각광을 받기 시작했으며 일명 오피니언 마이닝(Opinion mining)이라고도 불리는 감성분석(Sentiment analysis)은 텍스트 데이터에 담긴 사람들의 긍정적/부정적인 주관적 감성을 추출하여 정량화 하는 텍스트 마이닝 기법의 일종으로, 이에 대한 연구는 BERT 출현 이전과 이후로 구분될 수 있다. BERT는 SOTA(State of the Art) 기법으로 각광을 받으며 많은 연구에 활용되었지만, fine-tuning 단계를 진행할 때 낮은 편향, 높은 분산을 갖게 되는데, 본 논문은 한국어 BERT 앙상블 모델을 제안하고 이를 이용한 감성분석을 통해 fine tuning 단계에서 대두되는 낮은 편향, 높은 분산의 문제를 앙상블 기법으로 해결하고자 하였다. ;With the development of information and communication technology(ICT), it has become possible to actively interact and communicate online. Many express their political opinions on the content of news articles on the internet, write reviews on specific products or services and through theses interaction with different parties, vast amount of unstructured text data generated has gradually increased its awareness as assets. Consequently, text mining has begun to draw attention in extracting insights that help make decisions by collecting and analyzing unstructured data. Sentiment analysis, also known as opinion mining, is a specifics type of text mining technique that quantifies people's positive and negative emotions within text data. The research on sentiment analysis can be largely divided before and after the advent of Google’s BERT (Bidirectional Encoder Representation of Transformers). BERT has been spotlighted and used in many research due to its State-of-the-Art (SOTA) techniques, producing better results than the models presented before the advent of BERT. Although BERT has been proved by many studies to outperform the machine learning models not just in sentiment analysis but in other NLP fields, it still leaves room for improvement with low bias and high variance within the fine-tuning stage. This paper aims to solve the problem of low bias and high variance by presenting a Korean BERT Ensemble model for sentiment analysis. This BERT ensemble uses different Korean BERT models(KoBERT, KcBERT, ALBert, RoBERTa and BERT-kor-based) to produce BERT Ensemble model using both voting and bagging. The validity of the Korean BERT Ensemble model compared to the single model has been proved through experiment.