DSpace at EWHA: Pattern Exploiting Train 방법을 활용한 속성 카테고리 감성분류 연구

Browse

My Repository

DSpace at EWHA일반대학원 빅데이터분석학협동과정 Theses_Master

View : 307 Download: 0

Pattern Exploiting Train 방법을 활용한 속성 카테고리 감성분류 연구

Title: Pattern Exploiting Train 방법을 활용한 속성 카테고리 감성분류 연구

Other Titles: Aspect Category Sentiment Classification Using the Pattern Exploiting Train Method

Authors: 윤지희

Issue Date: 2023

Department/Major: 대학원 빅데이터분석학협동과정

Publisher: 이화여자대학교 대학원

Degree: Master

Advisors: 신경식

Abstract: With the recent surge in activities on social media platforms, users are sharing various information such as individual emotions, opinions, and evaluations in the form of text. These social media activities provide abundant data on consumers' emotions and attitudes, which are used for decision-making and analysis in various fields such as companies, governments, and research institutes. Emotional analysis is a technology that extracts emotions from such social media data and quantifies them, and analyzes positive and negative reviews to help derive directions for improvement in products and services. Existing emotional analysis mainly focused on classifying emotions in the text and evaluating the degree of emotions, making it difficult to understand the reasons for why people have specific emotions. On the other hand, Aspect-based Sentiment Analysis (ABSA) enables more detailed emotional analysis by identifying the reason for the emotion as well as the emotion. For example, if emotional analysis could only find out the information that it was "positive" in the sentence, "It was good that the employees were so kind," attribute-based emotional analysis even analyzes the reason for the emotion that the customer responds as "positive" to the attribute of "employee." Products, services, and policies can be analyzed and improved at a subdivided level using these detailed emotional analysis results. Despite the high utilization of Aspect Category Sentimental Classification (ACSC) and various studies being conducted, data problems are still a limitation. Attribute-based emotional analysis has fewer labeled data than emotional analysis. This is because attribute-based emotional analysis tasks have to be labeled in much finer units than emotional analysis tasks, so the cost of labeling is quite high due to high complexity (Zhang et al., 2022). In addition, it is common for imbalance problems to occur due to various attribute categories. For example, in the case of product reviews, there may be several attribute categories such as product performance, design, price, and weight, and data sets composed of these various attribute categories are likely to have an unbalanced distribution of data by attribute. Small amounts of data and unbalanced data distribution can negatively affect the performance of attribute category sentiment classification because it is difficult to properly learn sensitivity about attributes. Building datasets for attribute category emotional classification is difficult, and the imbalance of attribute categories is an inevitable problem. To solve this problem, it is more cost-effective to use a model that can achieve superior performance with less data than to increase labeled data. Therefore, this paper used a prompt method that can produce good performance with only a small amount of data to improve the performance of attribute category emotional classification with only a small amount of data and increase its utilization in various fields. In particular, we would like to apply the Pattern Exploitation Training (PET) methodology, which can increase performance by soft labeling unlabeled data through blank fill-in learning with prompts, to attribute category emotional classification to see if excellent performance can be achieved with only a small amount of data set. Therefore, this paper explored ways to overcome the difficulties of data construction and improve the performance of attribute category emotional classification with a small amount of data. To this end, we applied the Pattern Exploitation Training (PET) methodology to attribute category sensitivity classification to see if excellent performance can be achieved with only a small amount of data set. As a result of the experiment, the PET-ACSC model showed high performance even with a dataset with 15 very small amounts of labels, and with only 105 data, 80.78% accuracy Macro F1 showed very high performance with 0.7202. In addition, it was confirmed that it showed higher performance than the model of fine-tuning the existing BERT. When we only looked at the PET-ACSC model, the individual pattern-specific differences were not significant, and we demonstrated that the method of learning the patterns together performed better than learning each.;최근 소셜미디어 플랫폼에서의 활동이 급증하면서 사용자들은 개인들의 감정, 의견, 평가 등 다양한 정보를 텍스트 형태로 공유하고 있다. 이러한 소셜미디어 활동은 소비자의 감성과 태도에 대한 풍부한 자료를 제공하며, 이는 기업, 정부, 연구기관 등 다양한 분야에서의 의사결정과 분석에 활용되고 있다. 감성분석은 이러한 소셜미디어 데이터에서 감성을 추출하고 이를 정량화하는 기술로써 긍정적인 리뷰와 부정적인 리뷰를 분석하여 제품 및 서비스의 개선 방향을 도출하는데 도움을 준다. 기존의 감성분석은 텍스트에서 감성을 분류하고 감성의 정도를 평가하는 것에 주로 초점을 두어 왜 사람들이 특정 감성을 가지는지에 대한 이유를 파악하기 어려운 한계가 있었다. 이에 반해, 속성기반 감성분석(ABSA, Aspect-based Sentiment Analysis)은 감성뿐만 아니라 해당 감성에 대한 이유를 파악하여 더욱 세분화된 감성분석이 가능하다. 예를 들어 “직원들이 너무 친절해서 좋았어요.”라는 문장에서 감성분석은 ‘긍정적’이라는 정보만을 알아낼 수 있었다면 속성기반 감성분석은 ‘직원’이라는 속성에 대해 고객이 ‘긍정적’이라고 반응한다는 감성의 이유까지 분석한다. 이러한 세부적인 감성분석 결과를 활용하여 제품, 서비스, 정책 등을 세분화된 차원에서 분석하고 개선할 수 있다. 속성 카테고리 감성 분류(ACSC, Aspect Category Sentiment Classification)는 활용도가 높고 다양한 연구가 이뤄지고 있음에도 불구하고, 데이터 문제가 여전히 한계로 작용하고 있다. 속성기반 감성분석은 감성분석에 비해 라벨이 있는 데이터 수가 적다. 이는 속성기반 감성분석 작업이 감성분석 작업에 비해 훨씬 세밀한 단위로 라벨을 달아야하기 때문에 복잡성이 높아 라벨링을 하는 비용이 상당히 높기 때문이다(Zhang et al., 2022). 또한 다양한 속성 카테고리로 인해 불균형 문제가 발생하는 것이 일반적이다. 예를 들어 제품 리뷰의 경우 제품의 성능, 디자인, 가격, 무게 등 여러 가지 속성 카테고리가 존재할 수 있는데 이렇게 다양한 속성 카테고리로 구성된 데이터 셋은 속성별로 데이터의 분포가 불균형할 가능성이 크다. 적은 양의 데이터와 불균형한 데이터 분포는 속성에 대한 감성을 올바르게 학습하기 어려워 속성 카테고리 감성 분류의 성능에 부정적인 영향을 미칠 수 있다. 속성 카테고리 감성 분류의 데이터 셋 구축은 어려움이 많으며, 속성 카테고리의 불균형은 피할 수 없는 문제이다. 이를 해결하기 위해서는 라벨링이 있는 데이터를 증가시키는 방법보다 적은 양의 데이터로도 우수한 성능을 달성할 수 있는 모델을 사용하는 것이 비용 효율적이다. 이에 본 논문은 적은 양의 라벨링이 있는 데이터만으로 속성 카테고리 감성 분류의 성능을 향상시키고 다양한 분야에서의 활용성을 높이기 위해 소량의 데이터만으로 좋은 성능을 낼 수 있는 프롬프트 방법을 사용했다. 특히 프롬프트 방법중에서도 프롬프트로 빈칸 메꾸기식의 학습을 통해 라벨이 없는 데이터를 소프트 라벨링하여 성능을 더 높일 수 있는 Pattern Exploiting Training(PET) 방법론을 속성 카테고리 감성 분류에 적용하여 소량의 데이터 셋만으로도 우수한 성능을 달성할 수 있는지 확인하고자 한다. 이에 본 논문은 데이터 구축의 어려움을 극복하고 소량의 데이터로도 속성 카테고리 감성 분류의 성능을 향상시킬 수 있는 방법을 탐구하였다. 이를 위해 Pattern Exploiting Training(PET) 방법론을 속성 카테고리 감성 분류에 적용하여 소량의 데이터 셋만으로도 우수한 성능을 달성할 수 있는지 확인하였다. 실험 결과 PET-ACSC 모델은 15개의 극소량의 라벨이 있는 데이터 셋으로도 높은 성능을 보였으며, 데이터 105개만으로 정확도가 80.78% Macro F1이 0.7202로 매우 높은 성능을 보였다. 그리고 기존 BERT를 미세조정하는 방식의 모델보다 높은 성능을 보임을 확인하였다. PET-ACSC 모델만 살펴보았을 때 개별 패턴별 차이는 크지않았으며 패턴을 각각 학습시키기보다 함께 학습시키는 방법이 성능이 더 좋다는 것을 입증했다.