DSpace at EWHA: The Effects of Marking on Computer-based Reading Test Performance

Browse

My Repository

DSpace at EWHA일반대학원 영어교육학과 Theses_Master

View : 864 Download: 0

The Effects of Marking on Computer-based Reading Test Performance

Title: The Effects of Marking on Computer-based Reading Test Performance

Other Titles: 컴퓨터기반 독해 평가의 표시 기능이 평가 점수에 미치는 영향

Authors: 이신혜

Issue Date: 2012

Department/Major: 대학원 영어교육학과

Publisher: 이화여자대학교 대학원

Degree: Master

Advisors: 신상근

Abstract: The present study investigated the effects of marking activities on Korean EFL college students’ computer-based reading test performances. It has been suggested that a major difficulty of the computer-based reading tests is the inability to apply marking strategies (Chin, 2005; Khalifa & Weir, 2009; Lee, 2004; O’Hara & Sellen, 1997; Paek, 2005). Marking activities such as underlining and annotating during reading have been reported to allow students to better comprehend and retain the content of the reading passages (Nist & Hogrebe, 1987). However, the marking function is currently unavailable in the great majority of computerized reading test, and this was considered to be the most prominent feature differentiating the computer-based reading test from the conventional paper-based test. The construct validity of computer-based reading tests, in this sense, should be called into question; from the differing testing condition, appropriate interpretation of the test-takers' reading abilities may not be possible in the computer-based reading test (Bachman, 2000). As such, the present study created a computer-based reading test on a tablet PC with an added feature to enable marking. The function was designed to let the participants make markings on the screen-based reading test in a manner similar to a paper-based testing situation. This was not only an effort to reduce the gap between the differing test formats and to present a way to establish the construct validity of the computer-based reading test. The study included 89 Korean college students, who were divided into three test groups: a paper-based group, a marking-enabled tablet PC group, and a regular tablet PC group. The paper-based group took a paper version of a reading test, while the two tablet PC-based test group were differentiated by the presence or absence of the marking function. All three test groups initially took a paper-based test for the purpose of analyzing the participants’ normal marking activities on a conventional paper-based test. This analysis was conducted to determine whether or not the participants actually performed the marking activities. In order to establish whether the inability to perform marking on a computer-based test was indeed a challenging factor for participants, it was necessary to analyze the extent of their use of marking during testing. The participants’ marking activities were investigated by counting the number of participants who used marking on the test and examining the quantitative and qualitative nature of the markings. The analysis of marking activities performed for the paper-based group and the marking-enabled tablet PC group participants was similar to that used in the initial paper-based test. The number of participants who marked on their tests, the quantity of markings, and quality of markings were analyzed for the two test groups having the option to make markings. Additionally, the correlation of marking activities with test performances was analyzed in order to investigate whether the acts of marking had affected the participants’ test scores. The paper-based group was included in the study for the purpose of comparing conventional marking activities with those of the marking-enabled tablet PC group. Additionally, the inclusion of the paper-based test could also be viewed as an extension of prior CBT-PBT comparability studies. The participants’ marking activities were divided into two categories: marking on the text in reading passages and marking test items. The former included acts of underlining or any type of marking involving sentences or words of the reading passages; the latter included annotations, comprised of brief Korean translations of English words in the test items, as well as crosses (Xs) or other marks drawn on item options. A quantitative figure for the markings in the reading passages was derived by dividing the number of marked words by the total number of words in the reading passages. The amount of marking on test items was obtained by dividing the number of marked items by the total number of test items. Besides examining the quantity of markings, the present study also opted to investigate the quality of the markings. A quality marking was defined as a mark made on a sentence containing information relevant to the answers for a test item. To examine the participants’ marking quality, sentences containing the answers to a corresponding test item were first identified. A ratio of marking quality was then derived by dividing the number of sentences marked by participants that matched the sentences identified as relevant by the total number of marked sentences. In this way, the proportion of marked relevant sentences for each individual participant was derived. Further analysis was conducted on the three test groups to investigate whether differences in test performance existed. The study analyzed the extent of the influence of the respective testing condition on the three groups’ test performances, further analyzed by proficiency level and item types. In addition, the perceptions of the two test groups taking a tablet PC-based test were elicited by a questionnaire and a structured interview. The study observed that the majority of participants employed marking strategies in the paper-based test. With the exception of a few participants, over half of the participants marked reading passages and the test items. In addition, a sizable number of participants in the marking-enabled tablet PC group used marking on the tablet PC-based test, to an even greater extent than the paper-based group, in the case of marked text in the reading passages. These results indicate that marking activities were frequently utilized in an actual testing situation. Furthermore, the results indicated that the marking function in the tablet PC-based test was also actively employed by many of the marking-enabled tablet PC group participants. For both the paper-based test and the tablet PC-based test, the quantity of marking made by the participants provided evidence that the amount of marking differs among participants by marking type. Particularly, it was observed that the amount of marking differed in between the paper-based group and the marking-enabled tablet PC group. To a statistically significant degree, more marking on the texts in passages was observed in the marking-enabled tablet PC group compared with the paper-based group. On the other hand, although the paper-based group used more marking on test items than on text in reading passages, there was no statistically significant difference between the two groups’ marking of test items. An analysis of the participants’ marking quality indicated that both relevant and irrelevant sentences were marked. In the paper-based test, less than half of the participants’ text markings applied to sentences containing information relevant to the answer for a test items. There was no significant difference in marking quality between high- and low-proficiency participants. This was also the case with the paper-based group and the marking-enabled tablet PC group. The markings of those two test groups appeared on content with and without relevant to an answers for test item. The results of the analysis of the relationship between the marking activities and the test performances indicated that the amount of marking on test items positively correlated with test performance. The marking activities of all participants in the present study in a preliminary paper-based test showed that the quantity of marking on test items was related to test performance. This finding was also consistent with the paper-based group; rather than the amount of marking on text and the quality of the marking, it was the quantity of marking on test items that was found to correlate with test performance. On the other hand, in the tablet PC-based test, it was found that the amount of marking on both the text in the reading passages and on the test items correlated with test performance. To analyze whether the marking activities had affected the test scores, the test performance for all three test groups was compared by proficiency level and item type. No significant differences were found between the test groups in terms of proficiency level or item types. Neither the paper-based group nor the marking-enabled tablet PC group significantly outperformed the regular tablet PC group, which had no access to the marking function. It was found that the participants’ test performance was not greatly influenced by the differing testing conditions. The perceptions of the two tablet PC-based test groups indicated that they still favored the paper-based testing format over the tablet PC-based test; the tablet PC group was challenged by the inability to use marking strategies on the test, as well as eye fatigue caused by the small font size and sentence spacing. The marking-enabled tablet PC group, also preferred the paper-based test, due to technological constraints identified in the tablet PC’s marking function and the stylus pen. However, both groups also noted the advantages of the tablet PC-based test. Those in the two tablet PC groups who had previously taken computer-based tests all responded that they preferred the tablet PC-based test over the computer-based test. The regular tablet PC group responded that the ability to adjust the position of the tablet PC’s screen at a convenient viewing height helpful. The marking-enabled tablet PC group found the marking function advantageous, despite its several technical issues; they felt that the marking function offered a testing environment similar to a paper-based test. The study’s findings indicate that marking strategies are frequently employed in test situations; the majority of participants used marking on the preliminary paper-based test, and a good number of the marking-enabled tablet PC group also utilized the marking function actively. This suggested a role for the marking function in the screen-based reading tests. The value of the marking function is in its potential to enhance the authenticity of computer-based tests. It would be beneficial in facilitating the use of marking strategies that test-takers ordinarily would use in conventional paper-based reading tests. Presenting marking function in screen-based reading test may also reduce the treats of construct validity of the conventional computer-based reading test. The present study’s attempt to create a marking function in screen-based reading test can be suggested as a way of establishing the construct validity of the conventional computer-based reading test. The present study also touched on the potential role of the tablet PC as an assessment. By incorporating the multi-touch feature in reading assessment, authentic reading conditions as comparable to paper-based tests may be created. The tablet PC may offer an attractive alternative to computers in assessing reading comprehension.;본 논문은 표시 전략이 독해 평가 상황에서 중요하게 작용하는지를 알아보고 컴퓨터기반 독해 평가의 점수에 미치는 영향을 조사하였다. 현 컴퓨터기반 독해 평가는 지문과 문제에 밑줄 긋기(underlining)나 주석 달기(annotating) 등의 표시를 할 수 없다는 점으로 인해 수험자들이 읽기 실력을 발휘하고 더 높은 독해 평가 점수를 얻는 데 부정적인 영향을 끼칠 수 있다는 관점이 제시되어왔다(Choi & Tinkler, 2002; Kahlifa & Weir, 2009; Koh & Kim, 2009; Lee, 2004; Paek, 2005). 이와 같이 기존의 지필 평가와 상이한 독해 상황은 컴퓨터기반 독해 평가의 구인타당도(construct validity)를 저해할 수 있는 요소인 구인외변량(construct irrelevant variance)으로 작용하여 수험자의 본래 읽기 실력을 측정하기 힘들다는 견해도 있었다(Bachman, 2000; Roever, 2001). 이에 따라 본 논문에서는 표시 기능을 지원하는 테블렛(tablet PC)에 현재의 컴퓨터독해 평가를 구현하여 표시 기능의 사용이 화면을 통한 독해 평가 상황에 미치는 영향을 알아보고 컴퓨터기반 독해 평가의 구인타당도의 측면에서 이러한 표시 기능을 통하여 수험자의 타당한 읽기 실력을 측정할 수 있는 방안을 제안하고자 하였다. 이를 위해 일차적으로 실제 평가 상황에서 밑줄 긋기와 메모하기와 같은 표시 전략의 사용을 분석하여 표시 전략의 중요성을 살펴보고 표시 전략과 독해 평가 점수와의 연관성을 조사하고자 하였다. 본 연구에서는 총 89명의 한국 대학생들을 대상으로 총 두 번의 시험, 설문조사, 그리고 면담을 시행하였다. 실험에 참여한 학생들은 지필 평가 집단, 표시 기능이 포함된 테블렛 평가 집단, 그리고 표시 기능이 포함되지 않은 테블렛 평가 집단으로 각각 나뉘었다. 지필 평가 집단과 표시 기능이 포함된 테블렛 평가 집단 간에는 상이한 평가 매체를 통한 독해 평가 상황에서의 표시 전략 사용과 시험 점수가 비교되었다. 표시 기능이 포함되지 않은 테블렛 평가 집단은 표시 전략이 사용 가능한 두 평가 집단과 시험 점수가 비교되었다. 학생들의 표시 전략을 ‘지문 표시’와 ‘문제 항목 표시’로 나누었다. 또한 표시의 양과 질은 지필 평가인 시험 1에서 참여한 전체 학생들을 대상으로 분석하였고 시험 2에서는 세 평가 집단 중 표시가 가능한 평가 집단인 지필 평가 집단과 표시 기능이 포함된 테블렛 평가 집단의 학생들을 대상으로 분석하였다. 본 논문의 결과는 다음과 같다. 첫째, 시험 1과 시험 2에 대다수의 학생들이 지문과 문제 항목에 표시하였다. 시험 2에서는 특히 더 많은 수의 표시 기능이 포함된 테블렛 평가 집단 학생들이 지문에 표시 기능을 사용하였다. 이는 표시 행위가 평가 상황에서도 중요한 전략이었음을 의미하였다. 더 나아가 화면을 통한 읽기 상황에서 주어진 표시 기능을 학생들이 활발히 이용하였음을 알 수 있었다. 학생들이 표시한 양을 분석한 결과, 시험 1과 시험 2에서 학생별로 지문과 문항에 표시한 양이 달랐다. 특히 시험 2에서는 지필 평가와 테블렛기반 평가 상황 간 지문에 표시하는 양에 통계적으로 유의미한 차이가 있음을 알 수 있었다. 평균적으로 표시 기능이 포함된 테블렛 평가 집단의 학생들이 지필 평가 집단의 학생들보다 지문에 더 많은 양을 표시하였다. 지필 평가 집단 학생들은 표시 기능이 포함된 테블렛 평가 집단의 학생들에 비해 문제 항목에 더 많이 표시를 하였으나 통계적으로 유의미한 차이는 발견되지 않았다. 표시 기능이 포함된 테블렛 평가 집단 학생들이 지문에 더 많이 표시하였다는 결과는 학생들이 지필 평가 상황에서뿐만 아니라 화면을 통해 제시되는 지문에도 표시 전략을 사용하였음을 의미하였다. 지필 평가 집단보다 문제 항목에 적은 양을 표시한 결과에 대하여 이후 면담에서 사용된 스타일러스 펜이 간략한 단어를 쓰거나 메모를 하는 데 있어서 불편함을 제공하였다는 학생들의 응답을 통해 그 원인을 찾아볼 수 있었다. 표시의 질에 대한 분석 결과, 학생들은 시험 1과 시험 2에서 학생들이 표시한 전체의 문장 중에서 문항의 답과 관계된 부분과 연관되지 않은 부분 모두에 표시가 된 것을 알 수 있었다. 이는 학생들이 하는 표시는 중요한 정보가 담긴 문장들에만 국한되지 않고 지문의 다양한 부분에 부가될 수 있음을 의미하였다. 시험 2에서도 표시 기능이 포함된 테블렛 집단과 지필 평가 집단 간 표시의 질에는 차이가 없었다. 두 집단 내의 상위와 하위 수준의 학생들 간 표시의 질에는 차이가 없었다. 두 집단과 두 집단 내의 상위와 하위 수준의 학생들 모두 문항 항목의 답과 연관된 부분과 그렇지 않은 부분 모두에 표시를 하는 것으로 나타났다. 표시 전략 사용과 시험 점수와의 상관관계를 알아본 결과, 시험 1에서는 지문에 표시한 양과 표시의 질을 제외하고 문항 항목에 표시한 양만이 시험 점수와 유의미한 관계가 있음을 알 수 있었다. 시험 2의 지필 평가 집단에서도 유사한 결과가 도출되었다. 그러나 시험 2의 테블렛 기반 독해 평가에서는 표시의 질을 제외하고 지문과 문항 항목에 표시한 양 모두 유의미하게 시험 점수와 관련이 있었다. 이는 시험 상황에서 표시 전략 중 문항 항목에 표시하는 양과 시험 점수 사이에 유의미한 상관관계가 있을 가능성을 내포한다고 볼 수 있었다. 둘째, 각 평가 집단 별 시험 2의 점수 비교 결과, 지필 평가 집단이 가장 높은 점수를 내었지만 나머지 집단들과의 유의미한 차이가 없음이 발견되었다. 각 집단의 상위 수준의 학생들은 평가 상황에 상관없이 하위 수준의 학생보다 높은 점수를 내었다. 평가 상황과 학생들의 수준에는 상호작용 효과가 없었으며 각각 상이한 평가 상황 또한 시험 점수에 미치는 영향이 없음이 발견되었다. ‘주제 찾기’, ‘세부 정보 찾기’, ‘추론하기’의 문항 유형에 따른 집단 별 문항 유형 점수 차이를 또한 분석하였다. 앞서 시험 2의 총점 결과와 마찬가지로 문항 유형에 따라 각 집단 간 유의미한 점수 차이가 없었다. 서로 다른 평가 상황에 따라 시험 점수가 크게 향상되거나 감소되는 현상은 보이지 않았다. 셋째, 두 테블렛기반 평가 집단의 학생들을 대상으로 다른 평가 매체와 테블렛기반 평가를 비교하도록 하는 설문조사에 응하게 한 결과, 두 평가 집단의 학생들은 여전히 지필 평가를 더 선호하는 것으로 밝혀졌다. 특히 표시 기능이 포함된 테블렛 집단의 경우, 화면을 통한 읽기에 표시가 가능하였음에도 불구하고 지면을 통한 읽기와 표시 전략의 사용을 더 긍정적으로 평가하였다. 두 집단 내에서 기존의 컴퓨터기반 독해 평가를 경험한 학생들에게는 컴퓨터기반 독해 평가와 비교를 하였을 때 테블렛기반 평가가 더 만족스러웠지만 전체적으로 테블렛기반 평가에 비하여 지필 평가가 아직은 더 선호되고 있음을 알 수 있었다. 특히 표시 기능이 포함된 테블렛 집단의 응답은 본 논문의 테블렛기반 평가의 기술적인 한계와 관련되었다. 수험자들은 지급된 스타일러스 펜이 실제의 펜과 유사하지 않아 화면에 표시를 하는 데 어려움이 있었으며 화면에 나타나는 표시의 색깔과 두께 또한 조절이 불가능하여 불편함이 있었다고 보고하였다. 한편 학생들은 테블렛가 이러한 기술적 제한을 개선하면 기존의 컴퓨터기반 독해 평가의 단점을 개선할 수 있을 것이라 제안하였다. 이와 같이 지필 평가뿐만 아니라 화면을 통한 평가 상황에서도 표시 기능이 주어지면 학생들이 표시 전략을 사용한다는 본 논문의 결과는 현재의 컴퓨터기반 평가의 개발과 제작에 시사점이 있다고 할 수 있다. 표시 전략이 포함된 집단과 포함되지 않은 집단 간의 시험 점수에는 미미한 차이가 발견되어 표시 전략을 사용하지 못하는 점이 화면을 통한 독해 평가 상황에서 크게 우려할 수준은 아니라고 볼 수 있었다. 그러나 표시 기능을 컴퓨터기반 평가에 포함하는 것은 또한 평가의 실제성(authenticity)과 구인타당도를 높이는 측면에서 시사점이 있다고 할 수 있다. 학생들이 기존의 지필 평가 상황에서 사용하는 표시 전략을 컴퓨터기반 평가 상황에서도 이용할 수 있도록 촉진하는 표시 기능은 지필 평가와 유사한 평가 상황을 구현하고 수험자들의 타당한 읽기 실력을 측정하는 데에 일조할 수 있을 것이다. 또한 문항 항목에 표시하는 양과 시험 점수와의 유의미한 상관관계가 발견된 본 논문의 결과는 평가 상황에서 문항 항목 표시의 중요성에 대한 시사점을 준다고 할 수 있다. 밑줄 긋기 등의 지문 위에 행해지는 표시 전략은 비교적 짧은 시간 안에 쉽게 부가되어 지문에 대한 정확한 이해를 돕기보다 습관적으로 이루어지는 표시 행위일 가능성이 있다(Idstein & Jenkins, 1972; McAnderw, 1983). 마찬가지로 본 논문에서도 지문에 표시하는 양과 문항의 답과 연관된 부분에 부가된 표시의 양 또한 평가 점수에 미치는 유의미한 효과가 발견되지 않았다. 평가 상황에서는 답과 연관된 부분에 표시를 하는 것뿐만 아니라 더 나아가 답을 효율적으로 고르는 데 있어서 문항 항목에 대한 표시 또한 강조될 필요가 있을 것이다. 본 논문에서는 또한 학생들의 설문 조사와 면담의 응답을 통하여 테블릿이 평가의 도구로써 갖는 가능성 또한 발견하였다. 비록 지급된 스타일러스 펜과 테블렛의 표시 기능에서 기술적인 한계로 인해 학생들이 여전히 지필 평가를 더 선호하였지만 컴퓨터기반 독해 평가를 경험한 학생들 또한 테블렛기반 평가를 만족스러워하였다. 이는 이러한 표시 기능이 포함된 평가가 컴퓨터기반 독해 평가의 단점을 극복할 수 있는 대안 중에 하나가 될 수 있음을 의미하였다.