DSpace at EWHA: 개선된 유사도 기반의 협업필터링 하이브리드 추천시스템

Browse

My Repository

DSpace at EWHA일반대학원 빅데이터분석학협동과정 Theses_Master

View : 822 Download: 0

개선된 유사도 기반의 협업필터링 하이브리드 추천시스템

Title: 개선된 유사도 기반의 협업필터링 하이브리드 추천시스템

Other Titles: Collaborative Filtering Hybrid Recommendation System with Improved Similarity

Authors: Shan, Ziyao

Issue Date: 2022

Department/Major: 대학원 빅데이터분석학협동과정

Publisher: 이화여자대학교 대학원

Degree: Master

Advisors: 신경식

Abstract: 최근 몇 년 동안 정보 시스템은 데이터 일일 처리량의 증가에 따라 발전했다. 사용자들의 새로운 데이터 요청에 따라 정보의 양도 기하급수적으로 늘어났지만, 방대한 데이터의 바다에서 사용자에게 적합한 정보를 찾아내기는 쉽지 않다는 단점이 지적되어왔다. 이에 따라 “Recommendation System”이 등장했다. 1990년 컬럼비아대 Karlgren연구진이 “추천 시스템”이란 개념을 처음 제안한 후로부터 세계적으로 추천 시스템에 관한 다양한 연구가 진행되어 왔다. 현재 추천 시스템은 콘텐츠 기반 추천, 협업 필터링 추천, 하이브리드 추천의 세 가지 카데고리로 나뉜다. 그 중에서도 “협업 필터링 추천 알고리즘(Collaborative Filtering)”은 가장 대표적으로 사용되고 있는 추천 알고리즘으로, 다양한 분야에서 이미 활용되고 있다. 하지만 전통적인 협업 필터링 알고리즘은 정확성이 낮다는 단점이 있었다. 이와 같은 단점으로 인해 기존의 연구에서는 협업 필터링 추천의 개선에만 초점이 맞춰져 있었다. 이들 연구성과에 따르면, 협업 필터링 알고리즘은 “사용자 기반 추천”과 “아이템 기반 추천”의 두 가지로 나뉜다. 선행연구 대부분에서는 위 두 알고리즘의 유사도 계산을 개선했다는 성과가 있었다고 볼 수 있다. 하지만 전통적인 시너지 필터링 알고리즘과 대다수 개선 알고리즘은 유저(User), 아이템(Item), 평가(Rating) 등의 3가지 데이터만 사용되어온 반면, item 데이터를 담은 Tag 등의 데이터는 제대로 활용되지 못했다. Item 선호 정보가 부족할 경우, 추천 시스템 신뢰도가 떨어진다는 단점이 존재한다. 이러한 알고리즘은 불필요한 데이터의 변수를 제거하지 못해 추천의 정확도에 영향을 미친다. 따라서 본 연구에서는 협업 필터링에 기반한 추천 시스템의 정확성이나 성능 문제를 해결하기 위해 Personal Rank를 협업 필터링과 결합한다. 또한 기존 문제점을 개선하고 정확도를 높이기 위해 Cascade 방법을 적용한 하이브리드 추천 알고리즘 모델을 제시할 것이다. 본고에서는 Personal Rank에 기반한 방법론과 아이템 기반의 협업 필터링에서 사용되는 타깃 사용자에 대한 데이터 정보(user Id, movieId, tag)를 결합하여 추천 알고리즘을 구현할 것이다. 이 결합 방식은 Pipelined 하이브리드 모델에 해당하는 “Cascade”에 기반한 하이브리드 추천 기법이라 할 수 있다. 우선, Personal Rank 기반 방법론을 통해 타깃 사용자의 각 사용자와 태그에 대한 확률(선호 가능 확률)을 산출하고, 선호 가능확률을 통해 데이터 세트의 각 데이터를 보정한다. 이 보정된 데이터 세트를 바탕으로 하여 산출된 아이템 기반의 협업 필터링을 통해 최종 아이템을 추천하는 알고리즘을 구성한다. 아울러, Precision, Recall, F1 점수를 비교하여 시스템의 성능을 분석할 것이다. Herlocke 연구진의 연구 결과에 따라, Precision, Recall, F1 점수가 높을수록 시스템 신뢰도가 높아진다는 것이 밝혀졌다. 실험 결과, 새로운 하이브리드 알고리즘이 전통적인 아이템에 기반의 협업 필터링 알고리즘보다 성능이 우수한 것으로 나타났다. 따라서 본 연구를 통해 향후 추천 시스템에서 사용자에게 더 정확한 추천을 할 수 있도록 전통적인 추천 알고리즘의 사고 패턴을 보완한, 개선된 추천 알고리즘을 연구한다. 본 연구를 통해 추천 알고리즘을 새로운 각도에서 접근해 볼 수 있도록 한다.;In recent years, information systems have evolved with the increase in data daily throughput. Although the amount of information has increased exponentially as users requested new data, it has been pointed out that it is not easy to find information suitable for users in the vast ocean of data. Accordingly, the "Recommendation System” emerged. Since Karlgren's research team at Columbia University first proposed the concept of "recommendation system” in 1990, various studies on recommender systems have been conducted worldwide. Currently, the recommendation system is divided into three categories: content-based recommendation, collaborative filtering recommendation, and hybrid recommendation. Among them, "Collaborative Filtering" is the most representative recommendation algorithm and is being used in various fields. Traditional collaborative filtering algorithms have the disadvantage of low accuracy. Due to these shortcomings, existing studies have focused only on the improvement of collaborative filtering recommendations. According to these research results, the collaborative filtering algorithm is divided into "user-based recommendation" and "item-based recommendation". Through most of the previous studies, it can be said that the similarity calculation of the above two algorithms had improved. However, while the traditional synergistic filtering algorithm and most of the improvement algorithms have used only three types of data: user, item, and rating, data such as tag containing item data was not utilized properly. When item preference information is insufficient, there is a disadvantage that the reliability of the recommendation system can be lowered. These algorithms cannot remove unnecessary data variables, affecting the accuracy of recommendations. Therefore, in this study, Personal Rank is combined with collaborative filtering to solve the problem of accuracy or performance of a recommendation system based on collaborative filtering. In addition, we will present a hybrid recommendation algorithm model to which the Cascade method is applied to improve the existing problems and increase the accuracy. In this paper, we will implement a recommendation algorithm by combining the method based on Personal Rank and the data information (user Id, movieId, tag) about the target user used in the item-based collaborative filtering. This combination method can be said to be a hybrid recommendation method based on “Cascade”, which is a pipelined hybrid model. First, the probability (preferability probability) for each user and tag of the target user is calculated through the Personal Rank-based methodology, and each data in the data set is corrected through the preference probability. Based on this calibrated data set, an algorithm for recommending the final item is constructed through the calculated item-based collaborative filtering. In addition, we will analyze the performance of the system by comparing the Precision, Recall, and F1 scores. According to the results of Herlocke and colleagues' study, it was found that the higher the Precision, Recall, and F1 scores, the higher the system reliability. Experimental results show that the new hybrid algorithm outperforms the traditional item-based collaborative filtering algorithm. Therefore, through this study, we study an improved recommendation algorithm by supplementing the thinking pattern of the traditional recommendation algorithm so that the recommendation system can make more accurate recommendations for users in the future. Through this study, the recommendation algorithm can be approached from a new angle.