DSpace at EWHA: AI 어학 애플리케이션의 개별화 추천 알고리즘

Browse

My Repository

DSpace at EWHA일반대학원 교육공학과 Theses_Master

View : 464 Download: 0

AI 어학 애플리케이션의 개별화 추천 알고리즘

Title: AI 어학 애플리케이션의 개별화 추천 알고리즘

Other Titles: Recommendation Algorithms for AI Language Learning Applications: Diversity versus Similarity in Recommendation Systems

Authors: 송주영

Issue Date: 2022

Department/Major: 대학원 교육공학과

Publisher: 이화여자대학교 대학원

Degree: Master

Advisors: 소효정

Abstract: Recently, Artificial intelligence(AI) is making significant developments in the field of education. This is because AI can provide personalized learning materials according to learner characteristics, proficiency level, and learning context and compensate for learner vulnerabilities. In particular, the application of AI in recommendation systems(RS) supports the decision-making in the overload of information according to learner characteristics and preferences. Educational recommendation system is a concept that has been mentioned since the early stages of artificial intelligence in the 1960s and 1970s and is also called an adaptive educational hypermedia(Montebello, 2018). RS for education typically has been designed to recommend appropriate items based on learners’preferences(Dascalu et al., 2016). This can be explained from the perspective of the learner's cognitive dissonance theory(Festinger, 1964). When they are provided with information which is inconsistent with their belief and values, people tend to experience cognitive dissonance and an aversive emotional state. Therefore, people are likely to pursue only information that is consistent with their preferences or exclude information that they do not prefer. However, as the recommendation system has been continuously studied in the field of education, there is also a critical view of RS that is consistent with learners’ preferences. A recommendation system based on learners' preferences can strengthen learners' information bias and reduce opportunities for discovery learning(Buder& Schwind, 2012). In addition, Holmes et al. (2019) insisted that the bias of educational data can only provide biased learning to learners. In this regard, there was a view that diversity, novelty, popularity, and serendipity should be considered as the criteria for forming the recommendation list in a recent study(Nguyen et al., 2018; Vargas & Castells, 2011; Xu et al., 2020). And previous studies aimed to verify those criteria empirically in recommending products and movies. Meanwhile, in the field of education, the problem of recommendation based on preference is raised, but experimental research on this is insufficient. In the previous studies on educational recommendation system, RS has been developed based on learners’preferences, ontology, and learning styles (Chen et al., 2019; Dascalu et al., 2016; Tarus et al., 2018). However, most previous studies mainly focus on how learning content is designed and recommended according to learning style within a mobile learning application, This implies that few studies are recommending AI language learning applications among a variety of applications. Therefore, this study selected AI language applications as targets for a recommendation for two reasons. First, AI language applications as a type of informal learning, are difficult for learners to choose because the support for learning goals, time, and learning is ill-structured. Second, although AI language applications have been increasing rapidly recently, the number of learners who use AI language learning applications is small because information on the application is lack(Jung, 2019). Thus, the purpose of this paper was to develop two types of RS(RS with diversity and RS with similarity and investigate their accuracy and satisfaction among learners. In order to develop RS algorithms, diversity and similarity were measured by learning styles(how learners learn) and achievement goals(why learners learn). Based on these research purposes, this study will suggest the data for designing RS in education. The specific research questions for this research are as follows. [Research Question 1] Is there a difference in RS with diversity and RS with similarity according to the accuracy? [Research Question 2] Is there a difference in RS with diversity and RS with similarity according to the satisfaction? [Research Question 3] What is the cause of the difference between RS with diversity and RS with similarity perceived by learners? To conduct this research, 100 participants aged from 19 to 39 years old were involved in the first experiment to construct a database of the learner profiles. Then, data from 82 learners who fully completed the surveys were analyzed related to learning styles and achievement goals. The satisfaction survey was completed after the participants used four to six AI language learning applications. Based on the survey results, two types of RS algorithms were designed through user-based collaborative filtering. At this point, the similarity score and diversity score were calculated based on 82 learners’learning styles and achievement goals. For the second experiment, it was imperative that similar participants should be selected to compare the two algorithms. Therefore, the learners were divided into two groups(RS with similarity and RS with diversity) based on demographic information such as gender, age, and occupation. Then, in-depth interviews were conducted with 12 learners who provided RS with diversity and 11 learners who provided RS with similarity and among the second subjects. Then, the interview was analyzed through thematic analysis. The results of this research are as follows. First, RMSE values of RS with similarity and RS with diversity were 0.91 and 1.26 respectively(Research question 1). The RMSE value is a commonly used method to measure the performance of the RS. The lower the value is the higher the predictive power. Thus, it can be implied that RS with similarity is higher predictive power in terms of accuracy. However, they were significantly smaller than 1.86 from the random prediction. This result showed that both RS based on user-based collaborative filtering are more accurate than the random recommendation. Second, there were no significant differences in learner satisfaction between RS with similarity and RS with diversity as a result of Wilcoxon test(Research question 2). This result indicated that RS with diversity were finding points that did not lower learners' satisfaction to a similar degree to RS with similarity.Third, learners satisfaction of RS with diversity and RS with similarity was affected by learning style and achievement goals(Research question 3). In addition, this research analyzed the learner's opinion on the perception of the RS with diversity, the preferred recommendation types, and other opinions. The implications of this study are as follows. First, this research compared RS with diversity and RS with similarity for the first time in the field of educational technology. Second, this research expanded the target of the RS by selecting AI language learning applications as recommending items. Third, this research was to empirically examine evaluation of RS in education. In particular, this paper provided useful information to design educational RS through in-depth interview. The limitations of this research and the suggestions for future research are as follows. First, experiments were conducted on adult participants, and the number of participants is limited. In future research, the effectiveness should be verified by expanding the subject of the experiment to elementary and secondary school students. Second, there are limitations in the language type of AI language learning applications. In the follow-up study, it is crucial to consider learner prior knowledge, and learner behavior in language learning applications. The item need to be recommended based on learner experience. Third, this study compares two types of RS. Here, future work can develop a hybrid RS considering the optimal level of diversity and similarity. Fourth, this study conducted a static analysis of learners' satisfaction through a satisfaction survey. It is necessary to analyze learners' satisfaction in various ways by analyzing learners' learning satisfaction and behavior in real-time. Despite these limitations, this research is meaningful in that two types of recommendation methods were designed and accuracy and satisfaction were analyzed. Through this research, it is expected that follow-up studies on diversity-based recommendation systems will be conducted in the field of education by discussing educational recommending approaches and strategies. In addition, research should be conducted to systematically evaluate the development process by expanding the research target.;최근 인공지능(Artificial Intelligence, AI) 기술의 발전은 교육 분야에서 큰 변화를 일으키고 있다. 인공지능은 학습자의 특성, 수준, 상황에 따라 개별화 학습 자료를 제공하고 학습자의 취약점을 보완하는 것을 가능하기 때문이다. 특히, 인공지능을 활용한 추천시스템은 정보의 홍수 속에서 사용자의 특성과 선호를 기반으로 의사결정을 지원한다는 점에서 주목받고 있다. 이러한 교육 분야에서 추천시스템은 1960년대에서 70년대에 인공지능의 초기 단계부터 언급되어온 개념이며 적응형 교육 하이퍼미디어 시스템으로도 불린다(Montebello, 2018). 그동안의 교육 추천시스템의 추천 방식은 학습자의 선호에 맞게 학습 아이템을 추천하는 것을 목적으로 하고 있다(Dascalu et al., 2016). 이는 학습자의 인지 부조화 이론의 관점에서 설명될 수 있다. 인간은 본인의 생각과 신념에 부합하지 않거나 선호하지 않는 정보를 처리할 때 인지적 부조화 상태에 놓이며 불편함을 느끼는 심리적 상태에 놓인다(Festinger, 1964). 따라서 인간은 자신의 선호와 부합하는 정보만을 추구하거나 선호하지 않는 정보를 배제하는 경향을 보인다. 그러나 교육 분야에서 추천시스템이 지속해서 연구되어 오면서 학습자 선호 기반의 추천시스템에 대한 비판적 시각도 존재한다. Buder와 Schwind(2012)는 학습자의 선호에 기반한 추천시스템이 학습자의 정보 편향성을 강화할 수 있으며 발견학습의 기회를 축소 시킬 수 있다는 문제점을 제기하였다. 또한, Holmes et al.(2020)은 교육적 맥락에서 데이터의 편향성은 학습자들에게 편향된 학습만을 제공할 수 있다고 한다. 이와 관련하여 최근 추천시스템 연구에서 추천리스트를 구성하는 기준으로 다양성(Diversity), 새로움(Novelty), 대중성(Popularity), 세렌디피티(Serendipity)가 고려되어야 한다는 견해가 존재한다(Nguyen et al., 2018; Vargas., & Castells, 2011; Xu et al., 2020). 그리고 다양한 연구에서 상품, 영화 추천에 있어서 이러한 기준들이 고려되고 실증적으로 검증되었다. 그러나 교육 분야에서는 학습자의 선호에 근거한 추천의 문제점을 제기하고 있지만 관련한 실험적 연구가 부족한 실정이다. 한편 기존 교육 추천시스템 연구에서 추천 대상은 학습자의 선호에 기반을 둔 이러닝 과정 추천, 학습 스타일 또는 온톨로지에 기반을 둔 학습 자료 추천 등 다양한 분야에서 논의가 이루어지고 있다(Chen et al., 2019; Dascalu et al., 2016; Tarus et al., 2018). 그러나 선행연구들은 모바일 학습 상황에서 하나의 애플리케이션 내에서 학습 콘텐츠를 학습 스타일에 따라 설계해 추천하는 방식에 관한 탐구는 이루어졌으나 여러 애플리케이션 중에서 학습자의 특성에 따라 추천하는 방식에 관한 탐구가 미비하다. 따라서 본 연구는 두 가지 이유로 AI 어학 애플리케이션을 추천 대상으로 선정하였다. 첫째, AI 어학 애플리케이션과 같은 비형식 학습은 학습 목표, 시간, 학습에 관한 자원이 비구조적이어서 학습자의 선택이 어렵다는 점을 주목하였다. 둘째, 최근 AI 어학 애플리케이션이 급증하고 있지만, 애플리케이션에 대한 정보 부족을 이유로 학습자의 사용 수가 적다는 점(정숙경, 2019)에 근거하였다. 이에 본 연구는 두 유형의 추천 알고리즘(다양성 기반 추천과 유사성 기반 추천)을 고안하여 정확도와 만족도를 기준으로 두 추천 알고리즘을 비교하고자 하였다. 추천 알고리즘을 고안하기 위해 다양성과 유사성의 기준은 학습양식(학습 방법 측면)과 성취목표(학습 목표 측면)를 기반으로 하였다. 이를 통해 본 연구는 교육자료 추천 방식에 대한 시사점을 제공하는 것을 목적으로 한다. 본 연구의 연구문제는 다음과 같다. [연구문제 1] 다양성 기반 추천과 유사성 기반 추천은 추천 방식의 정확도에서 차이를 보이는가? [연구문제 2] 다양성 기반 추천과 유사성 기반 추천 방식에 대한 학습자의 만족도 차이는 어떠한가? [연구문제 3] 다양성 기반 추천과 유사성 기반 추천 방식에 대한 학습자의 만족도 차이가 발생하게 된 원인은 무엇인가? 연구문제를 확인하기 위해 학습자 프로파일링을 위한 1차 실험에서 만 19세에서 만 39세까지 총 100명의 학습자가 모집되었다. 그리고 1차 실험 대상자 100명 중 응답을 성실히 한 학습자 82명을 대상으로 학습양식과 성취목표를 기반으로 설문 내용을 분석하였다. 1차 실험참가자들은 4개에서 6개의 AI 어학 애플리케이션을 사용하고 앱에 대한 만족도 설문조사를 수행하였다. 그리고 설문조사 내용을 바탕으로 사용자 기반 협업 필터링을 통해 두 유형의 추천 알고리즘을 고안하였다. 이 단계에서 사용자의 유사성 점수와 다양성 점수를 측정하기 위해 학습양식과 성취목표 점수가 사용되었다. 2차 실험은 유사성 기반 추천과 다양성 기반 추천을 하기 위해 집단 동질성 확보가 중요하였다. 따라서 실험참가자들은 인구통계학적 정보(성별, 나이, 직업, 학력)를 통해 30명씩 두 그룹으로 배정되었다. 또한, 2차 실험 대상자 중 유사성 기반 추천을 제공한 11명과 다양성 기반 추천을 제공한 12명을 대상으로 심층 인터뷰를 수행하고 테마 분석을 수행하였다. 연구문제에 대한 연구결과를 요약하면 다음과 같다. 첫째, RMSE(Root Mean Squared Error) 값은 유사성 기반 추천과 다양성 기반 추천이 각각 0.91, 1.26으로 나타났다. RMSE 값은 추천시스템의 성능을 판단하기 위해 흔히 쓰이는 측정방법이며 숫자가 낮을수록 예측력이 높은 알고리즘으로 판단한다. 이에 기반해서 볼 때, 유사성 기반 추천이 정확도 측면에서는 더 높은 것을 알 수 있다. 그러나, 랜덤 추천 방식(RMSE=1.86)보다 두 추천 방식 모두 정확도가 높은 것으로 확인되었다. 이는 사용자 기반 협업 필터링을 통한 추천 방식이 랜덤 추천 방식보다 정확한 추천을 제공한다는 것을 의미한다. 둘째, 윌콕슨 검정 결과 학습자의 만족도에 있어서 유의한 차이를 보이지 않았다. 이는 다양성 기반 추천이 유사성 기반 추천과 비슷한 정도로 학습자의 만족도를 낮추지 않는 지점을 찾아 추천을 제공하였음을 의미한다. 셋째, 두 유형의 추천 알고리즘 모두 학습자들의 만족도에 학습양식과 학습 목표가 영향을 미치는 것으로 확인되었다. 이에 더해 다양성 기반 추천리스트에 대한 인식, 선호하는 추천 방향, 기타 의견에 대해 학습자의 의견을 확인할 수 있었다. 본 연구의 의의는 다음과 같다. 첫째, 본 연구는 국내 교육공학 분야에서 처음으로 다양성 기반 추천 알고리즘과 유사성 추천 알고리즘을 비교 연구한 연구이다. 둘째, 추천 대상 측면에서 AI 어학 애플리케이션을 선정하여 추천시스템의 연구대상을 확대하였다. 셋째, 학습자들에게 실제로 추천 알고리즘을 사용하게 하여 추천시스템 평가의 실제성을 높였다. 특히, 심층 인터뷰를 통해 교육용 추천시스템을 설계하는 데에 유용한 정보를 도출하였다. 다음으로 본 연구의 한계점과 후속연구에 대한 제안은 다음과 같다. 첫째, 실험연구를 성인 대상으로 수행하였으며 표본 수가 제한적이다. 따라서 후속연구에서는 실험 대상을 초·중등학생에게 확대하여 효과성을 검증해야 할 것이다. 둘째, 본 연구에서는 AI 어학 애플리케이션 언어 유형에서 한계점이 있다. 후속연구에서는 특정 언어에 대한 학습자의 사전학습 수준, 앱을 사용한 학습 경험 등에 따른 추천시스템의 정교화가 요구된다. 셋째, 본 연구에서는 두 유형의 추천시스템을 비교하였다. 이에 후속연구에서는 두 유형의 추천 알고리즘을 적절히 혼합하는 방식에 관한 연구를 수행해야 할 것이다. 넷째, 본 연구는 학습자의 만족도 분석을 설문조사를 통해 정적인 형태로 수행하였다. 후속연구에서는 실시간으로 학습자들의 학습 만족도 및 행동을 분석하여 학습자의 만족도를 다각적으로 분석할 필요가 있다. 이러한 연구의 제한점에도 불구하고, 본 연구는 두 유형의 추천 방식을 설계하고 정확도와 만족도를 분석하였다는 점에서 의의가 있다. 이를 통해 추천시스템에 대한 교육공학적 접근 및 전략을 논의함으로써 본 연구를 기반으로 교육 분야에서 다양성 기반 추천시스템과 AI 어학 애플리케이션 추천 방식에 관한 후속 연구가 수행되기를 기대한다. 또한, 성인 대상뿐만 아니라 연구대상을 확대하여 실제로 추천시스템을 시범 운영하여 개발 과정에서 평가까지 체계적으로 평가하는 연구가 수행되어야 할 것이다.