DSpace at EWHA: Conditional Generative Adversarial Network(CGAN) 기반 협업 필터링 추천 시스템

Browse

My Repository

DSpace at EWHA일반대학원 빅데이터분석학협동과정 Theses_Master

View : 1048 Download: 0

Conditional Generative Adversarial Network(CGAN) 기반 협업 필터링 추천 시스템

Title: Conditional Generative Adversarial Network(CGAN) 기반 협업 필터링 추천 시스템

Other Titles: Conditional Generative Adversarial Network based Collaborative Filtering Recommender System

Authors: 강소이

Issue Date: 2021

Department/Major: 대학원 빅데이터분석학협동과정

Publisher: 이화여자대학교 대학원

Degree: Master

Advisors: 신경식

Abstract: 정보기술의 발달로 정보의 양이 나날이 축적되어 다양한 아이템에 대한 선택의 폭이 넓어졌다. 그러나 정보 과부하로 인해 사용자가 원하는 정보를 쉽게 찾기 어렵다는 문제점이 있다. 인터넷이 대중화되면서 사용자들은 가용한 여러 정보를 읽어보고 종합적으로 판단하는 대신 정보 검색 및 학습 시간을 절약해줄 수 있는 가시화된 시스템에 의존하고자 한다. 이에 따라 소비자의 욕구와 관심에 맞추어 개인화된 제품을 추천하는 추천 시스템은 비즈니스에 필수적인 기술로서의 그 중요성이 증가하고 있다. 추천 시스템의 대표적인 모형 중 협업 필터링은 우수한 성능으로 다양한 분야에서 활용되고 있다. 그러나 한계점도 존재한다. 사용자-아이템의 선호도 정보가 충분하지 않을 경우 성능이 저하되는 희소성 문제는 협업 필터링의 가장 주요한 한계점이다. 또한 실제 평점 데이터의 경우 대부분 높은 점수에 데이터가 편향되어 있어 심한 불균형을 갖는다. 불균형 데이터에 협업 필터링을 적용할 경우 편향된 클래스에 과도하게 학습되어 추천 성능이 저하된다. 이러한 문제를 해결하기 위해 다양한 방법이 협업 필터링 분야에서 연구 되어 왔다. 그러나 희소성 문제를 해결하기 위한 대부분의 연구는 사용자들의 소셜 네트워크 등 개인정보나 아이템에 대한 특성 등의 추가적인 외부 데이터가 확보되어야만 적용할 수 있어 유용성이 떨어진다. 또한 기존의 전통적인 오버샘플링 기법들은 동일한 데이터를 반복 학습하기 때문에 과적합(overfitting)이 일어날 가능성이 높으며, 학습에 노이즈로 작용해 추천 성능을 떨어뜨린다. 이에 더하여 다중 클래스 불균형 문제에 대한 대부분의 기존 연구는 이진 클래스 불균형 기법을 사용하지만 일부 방법들은 고차원 데이터에 직접 적용될 수 없으며 덜 효과적이거나 부정적인 영향을 미치기도 한다. 본 연구에서는 Conditional Generative Adversarial Network(CGAN)을 활용한 협업 필터링 모형을 제안하고자 한다. CGAN을 기반으로 협업 필터링 구현 시 발생하는 희소성 문제를 해결함과 동시에 실제 데이터에서 발생하는 데이터 불균형을 완화하여 모형의 정확도를 높이는 것을 목표로 한다. 먼저 CGAN을 이용하여 비어있는 사용자-아이템 매트릭스에 실제와 흡사한 가상의 데이터를 생성하였다. 희소성을 가지고 있는 기존의 매트릭스로만 학습한 것과 해당 모형을 비교했을 때 높은 정확도가 예상된다. 이 과정에서 Condition vector y를 이용하여 소수 클래스에 대한 분포를 파악하고 그 특징을 반영하여 데이터를 생성하였다. 이후 협업 필터링을 적용하고, 하이퍼파라미터 튜닝을 통해 추천 시스템의 성능을 최대화하는데 기여하였다. 비교 대상으로는 전통적인 오버샘플링 기법인 SMOTE, BorderlineSMOTE, SVM-SMOTE, ADASYN와 랜덤 Latent vector을 받아 데이터를 생성하는 GAN을 사용하였다. 결과적으로 데이터 희소성을 가지고 있는 기존의 실제 데이터 뿐만 아니라 기존 오버샘플링 기법들보다 제안 모형의 추천 성능이 우수함을 확인하였으며, RMSE, MAE 평가 척도에서 가장 높은 예측 정확도를 나타낸다는 사실을 증명하였다. 본 연구에서 제안한 모형은 실제 데이터를 사용한 추천 시스템의 예측 한계를 해소시켰다는 것에 의의가 있으며, 해당 모형을 영화산업 데이터 뿐만 아니라 다양한 도메인에 활용할 경우 효과적인 추천 시스템을 구축하는데 기여할 수 있을 것으로 기대한다.;With the development of information technology, the amount of available information increases daily. However, having access to so much information makes it difficult for users to easily find the information they seek. Users want a visualized system that reduces information retrieval and learning time, saving them from personally reading and judging all available information. As a result, recommendation systems are an increasingly important technology. Collaborative filtering is used in various fields with excellent performance. However, limitations do exist. Sparsity is the main limitation of collaborative filtering when user-item preference information is insufficient. In addition, real-world score data are mostly biased to high scores, resulting in severe imbalances. Applying collaborative filtering to unbalanced data results in excessive learning on biased classes, which leads to poor recommendation performance. To solve these problems, various methods have been studied. However, most attempts to solve the sparsity problem are not optimal because they can only be applied when additional data such as users' personal information, social networks, or characteristics of items are included. In addition, traditional oversampling techniques are likely to cause overfitting because they repeat the same data, which acts as noise in learning, reducing recommendation performance. Furthermore, most existing works on multi-class imbalance problems use binary-class imbalance techniques, but some methods cannot be directly applied to high-dimensional data and may have less effective or negative effects. We propose a collaborative filtering model using a conditional generative adversarial network (CGAN). Based on CGAN, we aim to generate realistic virtual data to populate the empty user-item matrix while mitigating data imbalances arising from real data . Conditional vectors identify distributions for minority classes and generate data reflecting their characteristics. We then maximize the performance of the recommendation system via hyperparameter tuning and then apply collaborative filtering to evaluate the recommendation performance. This process should improve the accuracy of the model by addressing the sparsity problem of collaborative filtering implementations while mitigating data imbalances arising from real data. High accuracy is expected when compared with standard oversampling models. Our model has superior recommendation performance over existing oversampling techniques and existing real-world data with data sparsity, and we demonstrate the highest prediction accuracy on the Root mean square and mean absolute error evaluation scales. This model addresses the predictive limitations of recommendation systems using real data, and it should contribute to building effective recommendation systems not only for film industry data, but also for other domains.