DSpace at EWHA: 분석 조건에 따른 텍스트 네트워크 분석 결과 비교

Browse

My Repository

DSpace at EWHA일반대학원 교육학과 Theses_Master

View : 120 Download: 0

분석 조건에 따른 텍스트 네트워크 분석 결과 비교

Title: 분석 조건에 따른 텍스트 네트워크 분석 결과 비교

Other Titles: Exploring Variations in Keyword Extraction for Text Network Analysis Using VOSviewer

Authors: 이주호

Issue Date: 2024

Department/Major: 대학원 교육학과

Keywords: 텍스트 네트워크 분석, VOSviewer, 교사전문성, 교사역량

Publisher: 이화여자대학교 대학원

Degree: Master

Advisors: 최윤정

Abstract: 텍스트 네트워크 분석은 언어 텍스트에서 핵심 단어(키워드)를 추출하여 이들의 관계를 통해 텍스트에 내재된 의미구조를 파악하는 방법이다(배진아, 이준구, 2022). 텍스트 네트워크 분석에서 ‘키워드’는 해석의 중심 지표로, 텍스트를 구성하는 많은 단어 속에서 텍스트의 핵심 정보 또는 주제를 포함하고 있는 단어인 키워드를 추출하는 과정은 매우 중요하다. 텍스트 네트워크 분석은 연구설계에 따라 동일한 텍스트로 분석을 하더라도 분석 결과에 차이가 발생할 수 있다(김방희, 김진수, 2014). 본 연구에서는 이 같은 차이를 실증적으로 확인하기 위해 VOSviewer가 제공하는 분석 옵션에 변화를 주어 분석 결과를 비교했다. 분석을 위한 문헌은 Web of Science의 KCI-Korea Journal Database에서 ‘교사전문성’과 ‘교사역량’을 검색어로 총 1,392편을 수집하였다. ‘분석 요소’, ‘counting method’, ‘relevance score’를 변수로 삼아 14개의 분석 조건을 수립하고 위 분석 조건별로 텍스트 네트워크 분석을 실시한 후 각각의 조건에서 산출된 결과를 비교 분석했다. 주요 결과는 아래와 같다. 첫째, 저자 키워드, 제목+초록, 제목, 초록 등 분석 요소를 다르게 설정했을 때, 네트워크 속성에 두드러진 차이가 나타났다. 저자 키워드와 달리 제목, 초록으로 분석할 경우, 논문의 내용 관련 단어뿐만 아니라 연구의 목적이나 방법을 추정할 수 있는 단어들이 키워드로 나타났다. 또한 초록을 포함하여 분석할 경우(제목+초록, 초록), 분석에 포함된 키워드 수가 월등히 많았으며, 키워드 간 동시 출현 관계는 이보다 더 큰 폭의 차이를 보였다. 그로 인해 복잡한 네트워크가 형성되었고, 텍스트의 핵심 내용을 파악하기 어려웠다. 반면, 저자 키워드나 제목으로 분석할 경우, 분석에 포함된 키워드 수가 적어 텍스트의 핵심 정보가 압축적으로 제시되었다. 한 클러스터를 이루는 키워드 수는 제목+초록, 초록 기반 분석에서 훨씬 더 많았지만, 생성된 클러스터 수에 있어선 분석 요소에 따른 차이가 발견되지 않았다. 둘째, 단어의 출현 수 및 동시 출현 관계를 계산하는 방법에 변화를 주었을 때 그 차이는 저자 키워드와 제목을 대상으로 분석한 경우에 비해 초록을 포함하여 분석했을 때 더 두드러졌다. 저자 키워드 기반 분석에서 counting method는 링크 강도 값에 영향을 주었고, full counting으로 분석한 경우에 더 높은 값을 보였다. 한편, 위 경우 counting method 변인이 단어의 출현 수를 조절하는 데 영향을 주진 않았고, 그에 따라 추출된 키워드가 모두 일치했다. 그러나, 클러스터는 fractional counting을 적용할 때 더 많이 생성되었다. 이어 제목만 포함하여 분석한 경우, counting method를 다르게 설정했음에도 출현한 키워드가 모두 같았고, 클러스터 수도 비교적 일정했다. 반면, 제목+초록, 초록의 경우, counting method에 따라 결과에 큰 차이를 보였다. 초록을 포함하여 분석한 경우, counting method가 단어의 출현 수에 영향을 주었고 그에 따라 full counting을 적용했을 때 더 많은 키워드 산출되었고, 동시 출현 관계 수와 강도 역시 더 높은 값을 가졌다. 또한 full counting을 적용했을 때 2배가량 더 많은 클러스터가 형성되었다. 제목+초록, 초록을 대상으로 분석 시 binary counting을 적용했을 때, 클러스터 간 구분이 더 분명해지는 효과가 있었으나, 이 역시 분석에 포함된 노드와 링크 수가 많아 단어 간 연결 관계를 파악하기 어려웠다. 셋째, relevance score 적용 여부에 따른 결과 차이는 초록의 포함 여부로 구분된다. 제목+초록, 초록의 경우 비슷한 양상을 보였고, 제목은 그 반대의 결과를 보였다. 제목 기반 분석 시 relevance score를 적용했을 때 텍스트의 중요 정보를 포함하고 있는 단어로 추려져 더 나은 결과를 얻을 수 있었다. 반면, 제목+초록, 초록을 대상으로 분석한 경우, 텍스트 주제를 구체화할 수 없는 일반적인(general) 의미의 단어가 일부분 제외되었으나, 텍스트에서 중요하게 다뤄지는 핵심 단어들 역시 분석에 포함되지 못하는 문제가 있었다. 위 연구 결과를 기반으로 향후 텍스트 네트워크 분석에서 참고할 점은 다음과 같다. 첫째, 본 연구에서 설정한 분석 조건 가운데 텍스트의 의미구조를 가장 효과적으로 확인할 수 있는 경우는 1) 저자 키워드를 대상으로 한 분석, 2) 제목으로 분석하되 relevance score를 적용한 경우로 나타났다. 이 경우 텍스트의 중심 내용을 빠르게 파악할 수 있었다. 둘째, 제목+초록, 초록은 결과적으로 매우 유사한 양상을 보였으며, 초록이 포함된 경우, counting method와 relevance score 설정에 따라 분석 결과가 크게 달라졌다. 셋째, 초록을 포함하여 분석 시 키워드 수 조정이 요구된다. 네트워크 단순화를 위해 binary counting을 적용하고 이와 더불어 최소 출현 수 조절이 필요해 보인다. 넷째, 초록을 대상으로 한 분석 시 relevance score가 주요 단어를 식별하지 못할 수 있음에 주의를 기울일 필요가 있다. 여러 단어와 동시 출현하며 높은 빈도로 출현하는 단어의 경우 제거될 가능성이 높으므로 분석 시 이 점에 유의해야 한다. 본 연구의 의의와 한계점은 다음과 같다. 본 연구는 분석 결과에 영향을 미치는 변수들을 조합하여 분석 조건을 구성하고, 분석 조건에 따른 결과 차이를 실증적으로 제시했다는 점에서 차별성을 갖는다. 시뮬레이션 자료가 아닌 실제 자료에 기반하여 분석하였기 때문에 위 연구 결과를 일반화할 수는 없으나, 향후 텍스트 네트워크 분석 연구에서 고려해야 할 점을 제안했다는 점에서 의의가 있다. 본 연구에서 나아가 키워드 선정 과정에서 고려되어야 하는 다양한 변인을 탐구하고 분석 조건을 좀 더 세분화한 연구를 통해보다 풍성한 논의가 이뤄지기를 기대한다.;Text network analysis involves extracting keywords from language texts to understand the inherent semantic structure through their relationships. In this process, selecting keywords, considered as central indicators of interpretation, is crucial for capturing the core information or themes embedded in the text. However, there is a lack of discussion on the various decision-making processes involved in keyword selection, leading to potential variations in results. This study aims to examine the differences in results when varying the factors related to keyword selection using VOSviewer. For analysis, a total of 1,392 documents were collected from the Korea Journal Database (KCI) in the Web of Science, using “teacher professionalism” and “teacher competency” as search keywords. Fourteen analysis conditions were established based on VOSviewer's analysis options: “analysis elements”, “counting method”, and “relevance score”. Network analysis was conducted under these conditions, and the results from each condition were compared. Firstly, differences were observed when setting different analysis elements such as author keywords, title + abstract, title, and abstract. Author keywords revealed topic-related words as keywords, while title or abstract analysis included not only content-related words but also words allowing the estimation of the research purpose or method. Discrepancies were also noted in the number of keywords and the values indicating the co-occurrence relationships between keywords. Including abstracts in the analysis led to a significantly higher number of keywords and a more extensive increase in co-occurrence relationships. However, this complexity in the network visualization made it challenging to identify the core content. Conversely, analyzing with author keywords or abstract resulted in fewer keywords, allowing for a relatively quicker understanding of the text's information. Secondly, differences in the calculation methods for word occurrences and co-occurrence relationships were more pronounced when analyzing abstracts, especially when compared to analyzing author keywords and titles. The counting method influenced the link strength values in author keyword-based analysis, with full counting showing higher values. When analyzing with author keywords, the counting method did not influence the frequency of word occurrences, resulting in a complete match of included keywords in the network. However, there was a difference in the number of clusters, with more clusters generated when analyzed using fractional counting. Analyzing titles alone did not show significant differences in results based on counting methods. In contrast, when analyzing title + abstract and abstract, results varied significantly depending on the counting method. Full counting resulted in a higher number of keywords, co-occurrence relationships, and stronger link values. Moreover, the cluster count doubled when full counting was applied, forming a complex network that made it challenging to discern the central content. Lastly, differences in results based on the application of relevance scores were divided by whether abstracts were included. Title + abstract and abstract showed similar patterns, while the title exhibited the opposite results. The relevance score aims to distinguish between general and specific meanings of words by considering word co-occurrence patterns, eliminating words that do not contribute to specifying the text's content. According to the research results, applying relevance scores to titles resulted in better outcomes by extracting keywords containing important information. However, in the case of title + abstract and abstract, some words with general meanings were removed, and simultaneously, keywords crucially addressed in the text were not extracted. This implies that, in this sentence, as general meaning words were removed, important keywords were also lost. Among the analysis conditions established in this study, the most effective confirmation of the semantic structure of the text occurred when analyzing ‘author keywords’ and ‘titles with the application of relevance scores.’ Title + abstract and abstract exhibited highly similar patterns, with significant variations in analysis results depending on counting methods and relevance score settings. This study differs from previous research by empirically presenting variations in analysis results by combining various factors related to keyword selection. While the results cannot be generalized due to the use of actual data instead of simulation data, the study contributes to proposing considerations for future text network analysis research.