DSpace at EWHA: 에세이 채점을 위한 컴퓨터 프로그램의 비교·분석

Browse

My Repository

DSpace at EWHA일반대학원 교육학과 Theses_Master

View : 1328 Download: 0

에세이 채점을 위한 컴퓨터 프로그램의 비교·분석

Title: 에세이 채점을 위한 컴퓨터 프로그램의 비교·분석

Other Titles: Comparison and Analysis of Computer Programs for Essay Scoring

Authors: 최윤정

Issue Date: 2005

Department/Major: 대학원 교육학과

Publisher: 이화여자대학교 대학원

Degree: Master

Abstract: 20세기 후반에 등장한 구성주의는 학습자 중심의 교육, 수준별 교육 등의 교육 풍토의 변화를 가져왔다. 구성주의의 등장으로 에세이형 문항에 대한 필요성과 요구가 증가하고 있다. 에세이형 문항은 학생들이 자유롭게 문제에 접근할 수 있고, 자유롭게 응답을 구성할 수 있기 때문에 구성주의 교육에 적합한 평가를 할 수 있으며 학생의 분석력, 비판력, 조직력, 종합력, 문제해결력, 창의력을 측정할 수 있는 장점이 있다. 그러나 에세이형 문항은 채점자내 신뢰도와 채점자간 신뢰도의 문제와 채점기준의 개발과 평가의 어려움으로 교사들이 평가에 활발하게 사용하지 못하고 있다. 본 연구에서는 외국에서 개발된 8종의 에세이 채점 프로그램들을 소개하고, 공통적인 특징들을 분석하고, 각 프로그램들의 차이점을 비교하였다. 그리고 각 에세이 채점 프로그램들의 문제점과 한계점을 분석하였다. 에세이 채점 프로그램의 공통점은 세 가지이다. 첫째, 모든 에세이 채점 프로그램은 분석적 평가방식을 이용하여 에세이를 평가한다. 둘째, 모든 에세이 채점 프로그램은 사람이 채점하는 채점 방법에 기초하여 에세이를 채점한다. 셋째, 모든 에세이 채점 프로그램들은 채점의 타당도를 검증하기 위하여 채점자가 채점한 결과와 프로그램이 채점한 결과를 비교한다. 에세이 채점 프로그램들이 가지는 공통적인 문제점은 네 가지로 정리해 볼 수 있다. 첫째, 채점기준 선정의 문제이다. 모든 에세이 채점 프로그램들은 분석적 평가방식을 사용하여 에세이를 채점하기 때문에 채점기준을 가진다. 에세이는 채점기준에 따라 채점결과가 달라지기 때문에 검사제작자는 에세이형 문항을 제작시 어떠한 학습목표를 가지고 문항을 제작할 지를 상세화하고, 이론적 배경이나 학습목표에 맞는 채점기준을 선정하여야 한다. 둘째, 훈련에세이의 수준과 개수의 문제이다. 모든 에세이 채점 프로그램은 사람이 에세이를 채점하는 과정을 학습하여 사람이 채점하는 것과 같은 방법으로 에세이를 채점한다. 훈련을 하기 위해서는 사람이 채점한 훈련에세이가 있어야 한다. 그러나 많은 프로그램들이 훈련에세이의 수준은 어떠해야 하는지, 훈련에세이의 개수는 어느 정도가 적절한지에 대한 이론적인 배경이나 설명을 하고 있지 않다. 훈련에세이는 컴퓨터가 채점자의 채점과정을 학습하기 위한 도구로서 훈련에세이의 수준과 개수에 따라 컴퓨터의 학습 범위가 결정된다. 따라서 훈련에세이의 수준과 개수에 대한 충분한 논의가 있어야 한다. 셋째, 훈련에세이를 채점하는 전문가의 문제이다. 모든 프로그램들은 채점자, 내용전문가들이 훈련에세이를 채점하게 하고, 채점자들은 컴퓨터가 미처 고려하지 못하는 부분들을 확인하면서 에세이 채점 프로그램을 보완하고 있다. 그러나 훈련에세이를 채점하는 전문가들의 수준이 어떠한지에 대한 정보가 없다. 그리고 어떠한 훈련과정을 거쳤는지, 몇 명의 채점자들이 훈련에세이 채점에 동원되었는지, 채점자간 신뢰도와 채점자내 신뢰도가 있는지에 대한 정보를 제공하지 않는다. 이러한 부분은 에세이를 채점할 때 필수적으로 제공되어야 한다. 넷째, 에세이 채점 프로그램의 채점 타당도를 검증하는 방법의 문제이다. 모든 프로그램들은 채점의 타당도를 검증하기 위하여 채점자와 프로그램의 채점 결과를 비교하였다. 물론 채점자와 비교하는 방법 외에 특별한 다른 대안이 없기 때문에 비교하는 방법을 선택한다고 연구자들이 밝히고 있지만, 채점자와 프로그램과의 비교를 통한 타당도 검증은 방법상의 문제가 있다. 프로그램을 비교ㆍ분석한 결과, 우리나라에 적용가능한 에세이 채점 프로그램들은 C-rater, E-rater, IntelliMetric, BETSY, SEAR이다. C-rater는 내용 중심의 채점을 하기 때문에 수행평가의 도구로 사용할 수 있다. C-rater는 개인적인 경험으로부터 온 사례, 개인의 의견 등은 채점하지 못하지만, 과학, 수학, 국어같은 검사의 결과에 대한 채점 타당도가 높다. E-rater와 IntelliMetric은 논술 교육, 작문 교육의 평가도구로 사용할 수 있고, Criterion(E-rater의 학습용 버전)과 MY Access!(IntelliMetric의 학습용 버전)는 논술 교육과 작문 교육을 위한 온라인 학습도구로 사용할 수 있다. BETSY와 SEAR는 서답형 문항 중 단답형, 괄호형, 완성형 문항에 적합한 채점 프로그램이라고 할 수 있다. 주관식 문항을 BETSY나 SEAR같은 프로그램을 사용하여 채점하고, 교사는 채점결과를 확인하는 과정을 가진다면 교사의 시간과 노력을 줄여줄 수 있을 것이다. 에세이를 채점하기 위한 컴퓨터 프로그램을 개발하기 위해서는 먼저 국어정보화가 이루어져야 한다. 또한 교육학자, 국어학자, 그리고 공학자들이 협력하여 에세이 평가를 위한 컴퓨터 프로그램을 개발하는 것이 필요하다. 하나의 에세이 채점 프로그램을 개발하는 데는 많은 노력과 시간과 비용이 든다. 그러나 학교 현장의 교사들의 시간과 노력을 줄여줄 수 있다는 점, 채점의 일관성과 정확성이 보장된다는 점, 학생들이 논리적인 글쓰기 연습을 할 기회가 많아져서 학생들의 작문능력이 향상된다는 점에서 에세이 채점 프로그램의 의의가 있다. 우리나라에서 에세이 채점 프로그램이 개발되기 위해서는 국가적인 차원에서 에세이 채점 프로그램의 개발을 장려하고, 적극적으로 재정을 지원하는 것이 필요하다.;Since the beginning of the 21st century, the paradigm of educational evaluation has gone through remarkable changes brought by constructionism, which was introduced in the late 20th century. With the new paradigm, schools provide students with opened, differentiated, and students-oriented curriculum. The changes lead to a growing demand for an essay type question-one of the supply type questions-, as it is suitable for constructionist evaluation, allowing students to freely access to questions and to freely complete answers. In addition, it can help make a correct assessment of a student’s ability to analyze, criticize, organize, synthesize, create, and solve problems. On the other side, there are problems with the consistency and reliability of scoring, and the production and assessment of test equipments. To resolve those problems, the United States developed the ‘automated essay-scoring program.’ The program contributes to solving practical issues regarding essay type questions. First, it ensures intra-scorer reliability and inter-scorer reliability. Second, it helps teachers save time and energy. Third, it gives students quicker feedback and guides them to produce better performance. This study introduced eight different automated essay scoring programs developed in other countries, analyzed what they have in common and how they are different, and examined problems and limits of each program. They have three things in common: first, they adopted the analytic evaluation model; second, they are based on the way people score essays; and third, results from scorers are compared with those from the programs for verification. The eight programs also have the same problems with the establishment of valid criteria, the level and number of training essays, specialists in scoring training essays, and methods of verifying the automated scoring. The analysis and comparison of the programs brought a conclusion that C-rater, E-rater, IntelliMetric, BETSY, and SEAR can be applied to Korean schools. C-rater can be used as a tool of performance assessment with its focus on contents. Although it fails to evaluate personal experiences and opinions, its scoring is highly valid for specific questions about science, mathematics, and language. E-rater and IntelliMetric will be useful for essay and writing classes, while Criterion and MY Access! can be effective tools for performance assessment of essay, writing, and language. A Korean equivalence of Criterion and My Access! would offer students more opportunities for logical essay writing, and develop students’ writing skills with its quick and detailed results and feedback. It will also make teachers spend less time and consume less energy in designing tests and scoring answers. BETSY and SEAR are perfect for supply type questions (short answer type, close type, and completion type), if not for essay type questions. These programs reduce teachers’ workloads, because what teachers have to do is just to verify scores for open questions. The computerization of Korean is a precondition for developing automated essay-scoring systems. Also, experts in education, Korean language, and engineering should work together to develop automated programs. Despite countless efforts, time, and costs for development, computer programs for essay scoring can make a real difference in schools. It will save teachers’ time and energy, ensure the consistency and reliability of scoring, and create more opportunities for logical essay writing. To develop an automated essay-scoring program in Korea, the government should take an active role in promoting and funding it.