DSpace at EWHA: 수학과 성취평가제에서 Bookmark 방법의 적용가능성 탐색 연구

Browse

My Repository

DSpace at EWHA일반대학원 수학교육학과 Theses_Master

View : 1104 Download: 0

수학과 성취평가제에서 Bookmark 방법의 적용가능성 탐색 연구

Title: 수학과 성취평가제에서 Bookmark 방법의 적용가능성 탐색 연구

Other Titles: A Study on the Applicability of the Bookmark Method to the Mathematics Achievement Evaluation System : Based on Comparative Analysis of Ebel Method and Angoff Method

Authors: 윤혜미

Issue Date: 2014

Department/Major: 대학원 수학교육학과

Publisher: 이화여자대학교 대학원

Degree: Master

Advisors: 김래영

Abstract: 성취평가제는 ‘학습자가 무엇을 어느 정도 성취하였는지’를 평가하는 제도로, 학습자의 성취수준을 교육과정에서 제시하고 있는 교과목별 성취기준에 따라 평가한다(한국교육과정평가원, 2013). 따라서 성취평가제 하에서 학습자의 성취수준을 평가하려면 피험자 집단의 특성과 검사도구의 난이도와 상관없이 안정적으로 피험자의 성취수준을 판정해야 한다. 2014학년도 고등학교 보통교과 성취평가제 도입을 대비하여 현장에 적합한 적용 방안을 마련하기 위해 2012학년도 2학기에 시범 적용된 준거설정방법은 Ebel(1972) 방법과 Angoff(1971) 방법을 변형한 방법이다(박선화 외, 2012). 그러나 Eble 방법과 Angoff 방법은 고전검사이론에 근거한 방법이기 때문에 검사도구의 특성에 따라 문항특성의 추정과 피험자의 능력 추정이 변할 수 있어 피험자의 성취수준을 잘못 판정할 위험성이 잠재되어 있다. 반면, Bookmark 방법은 문항반응이론에 근거하고 있다. 그런데 문항반응이론에서는 문항특성의 추정과 피험자의 능력 추정이 피험자 집단의 특성과 검사도구의 난이도에 영향을 받지 않는다는 장점이 있다(성태제, 2001). 본 연구에서는 동일한 성취기준을 평가하는 난이도가 다른 두 검사를 동일한 피험자에게 실시하여 Ebel 방법, 수정된 Angoff 방법, Bookmark 방법 중 어떤 방법이 피험자의 성취수준을 보다 안정성 있게 판정하는지 알아봄으로써 수학과 성취평가제에서 Bookmark 방법의 적용가능성을 탐색하였다. 이를 위해 다음과 같은 연구문제를 설정하였다. 연구문제 1. 성취수준별 최종 분할점수는 Ebel 방법, 수정된 Angoff 방법, Bookmark 방법에 따라 어떻게 설정되며, 이에 의해 분류된 성취수준별 피험자의 비율이 검사의 난이도에 따라 차이가 있는가? 연구문제 2. 동일한 성취수준으로 분류되는 피험자의 비율은 준거설정방법에 따라 차이가 있는가? 연구문제 3. 세 가지 준거설정방법(Ebel 방법, 수정된 Angoff 방법, Bookmark 방법) 중 어떤 방법이 피험자의 성취수준 판정에서의 분류 일관성이 높은가? 연구문제 1을 해결하기 위해 동일한 성취기준을 평가하는 난이도가 다른 두 검사를 동일한 피험자에게 실시하여 세 가지 준거설정방법에 따라 성취수준별 최종 분할점수를 설정하였다. 그 다음, 피험자의 성취수준을 A, B, C, D, E의 5개의 수준으로 분류하고, 분류된 성취수준별 피험자의 비율이 검사의 난이도에 따라 차이가 있는지를 비교하였다. 연구문제 2를 해결하기 위해 일치도 통계 와 Kappa 계수를 산출하여 난이도가 다른 두 검사에서 동일한 성취수준으로 분류되는 피험자의 비율이 준거설정방법에 따라 차이가 있는지를 비교하였다. 연구문제 3을 해결하기 위해 각 준거설정방법의 피험자 성취수준 판정에서의 분류 일관성을 비교한 후, 준거설정방법에 따른 피험자의 성취수준 판정의 일치여부 차이에 대해 통계적으로 검증하였다. 본 연구의 연구결과는 다음과 같다. 1의 결과, Bookmark 방법의 성취수준 B(기준 성취율 80% 이상 ~ 90% 미만)를 제외한 세 가지 준거설정방법의 나머지 모든 성취수준에서, 상대적으로 난이도가 높은 검사에 대한 성취수준별 최종 분할점수가 난이도가 낮은 검사에 비해 낮게 산출되었다. 또한 세 가지 준거설정방법 모두 성취수준별 피험자의 비율이 모든 성취수준에서 검사의 난이도에 따라 차이가 있는 것으로 나타났다. 2의 결과, 세 가지 준거설정방법 중 피험자의 성취수준 판정에서의 일치정도가 보통인 것은 Bookmark 방법뿐이었으며, Ebel 방법과 수정된 Angoff 방법은 그 일치정도가 낮았다. 3의 결과, Bookmark 방법의 분류 일관성이 높은 편은 아니었지만, Ebel 방법과 수정된 Angoff 방법에 비해 높았고, Bookmark 방법과 다른 두 방법 간에만 피험자의 성취수준 판정에서의 일치여부 차이에 대해 통계적으로 유의미한 차이가 있었다. 본 연구의 결과를 토대로 다음과 같은 결론 및 시사점을 얻을 수 있었다. Ebel 방법, 수정된 Angoff 방법, Bookmark 방법 모두 각 성취수준에서의 피험자 비율이 모든 성취수준에서 검사의 난이도에 따라 차이가 있었다. 따라서 정기고사를 출제 시, 수학과 성취평가제의 목적에 맞게 학습자의 성취수준을 평가하려면 수학과 교육과정의 성취기준에 적합하면서도 난이도가 다양한 문항들이 필요하다. 세 가지 준거설정방법을 상대적으로 비교했을 때, 비록 Bookmark 방법의 분류 일관성이 높은 편은 아니었지만 다른 두 방법에 비해 높았고, Bookmark 방법과 다른 두 방법 간에만 피험자의 성취수준 판정의 일치여부 차이에 대해 통계적으로 유의미한 차이가 있었기 때문에 Bookmark 방법이 다른 두 방법에 비해 피험자의 성취수준을 잘못 판정할 위험을 줄일 수 있는 방법임을 알 수 있었다. 따라서 수학과 성취평가제에서 준거설정방법으로 Bookmark 방법을 적용하는 방안을 고려해 볼 필요가 있다. 마지막으로 본 연구결과의 수학과 성취평가제에서의 적용가능성 및 일반화 가능성을 높이기 위해서는 수학과 교육과정의 성취기준에 적합하면서도 다양한 난이도를 갖고 있는 문항을 포함한 혼합형 검사도구를 개발하여 각기 다른 성취수준을 갖고 있는 학생들을 대상으로 후속 연구를 수행할 필요가 있다. 또한 성취평가제가 수학교과에 도입되면 학생들의 성취기준 도달 수준에 대한 정보를 학생들에게 평가 결과로서 제공해야 하고, 그 결과를 바탕으로 교수ㆍ학습에서 피드백을 제공해야 한다. 이를 위해서는 체계적인 절차에 의한 성취수준을 설정해야 한다. 성취수준 설정을 위해 신뢰성 있고 타당한 분할점수가 필요하기 때문에 앞으로 수학과 성취평가제에 적합한 준거설정방법에 대한 지속적인 연구가 필요하다.;The Achievement Evaluation System, which is designed to evaluate "what and how much a learner has achievement, evaluates a learner's achievement level on the basis of the achievement criteria of each subject prescribed in the Curriculum(Korea Institute of Curriculum & Evaluation, 2013). To evaluate learners' achievement levels using the Achievement Evaluation System, it is thus necessary to judge their achievement levels in stable ways, regardless of the subject group characteristics and the level of the evaluation tool's difficulty. The standard setting method, which was applied as a pilot during the second semester of 2012 school year to devise an application plan fit for the field in preparation of the Achievement Evaluation System for general subjects in high school in 2014, was an alteration between the Ebel (1972) method and Angoff method (1971) (Park Seon-hwa et al., 2012). Since both the methods, however, are based on the classical test theories, the estimation of item characteristics and subjects' abilities can be changed by the characteristics of the evaluation tools. As a result, the researcher may run a potential risk of wrongly judging the achievement levels of the subjects. On the other hand, Bookmark method is based on item response theory, which has an advantage of not being influenced by the subject group characteristics and the level of the evaluation tool's difficulty when it comes to estimating item characteristics and subjects' abilities(Seong Tae-je, 2001). This study set out to administer two tests of different difficulty levels for the same achievement criteria to the same group of subjects and investigate which of the Ebel, modified Angoff, and Bookmark methods would assess the achievement levels of subjects in a more stable fashion, thus exploring the applicability of the Bookmark method in the Mathematics Achievement Evaluation System. For those purposes, the study set the following research questions: Research Question 1. How are the final cut scores set by the achievement levels according to the Ebel, modified Angoff, and Bookmark methods? Are there any differences in the resulting percentage of subjects by the achievement levels according to the test difficulty level? Research Question 2. Are there any differences in the percentage of subjects classified under the same achievement level according to the standard setting methods? Research Question 3. Which of the three standard setting methods(Ebel, modified Angoff, and Bookmark methods) will have classification consistency in the judgment of subjects' achievement levels? In an effort to answer Research Question 1, after conducting two other tests of different difficulty levels that would evaluate the same achievement criteria to the same subjects, the researcher drew the final cut scores in each achievement level according to the three standard setting methods. And then, after classifying the achievement levels of the subjects as 5 levels of A-B-C-D-E, the researcher tried to find if there are any differences in the percentage of the subjects classified under the same achievement level by that way. In an effort to answer Research Question 2, after computing agreement statistics and Kappa coefficient, the researcher tried to find if in the two other tests of different difficulty levels, there are any differences in the percentage of the subjects classified under the same achievement level according to the standard setting methods. In an effort to answer Research Question 3, the three standard setting methods were compared for classification consistency in the judgment of subjects' achievement levels, and consistency differences in the judgment of subjects' achievement levels according to the reference setting methods were statistically tested. The research findings were as follows: As for Research Question 1, the final cut scores by the achievement levels were lower in the test of relatively higher difficulty level than in that of lower difficulty level for all the achievement levels of the three standard setting methods except for Achievement Level B (Achievement Rate: over 80% under 90%) in the Bookmark method. In addition, there were differences in the percentage of subjects for all the achievement levels in the three standard setting methods according to the test difficulty level. As for Research Question 2, only the Bookmark method of the three standard setting methods recorded an average consistency level in the judgment of subjects' achievement levels with the Ebel and modified Angoff recording a low consistency level. As for Research Question 3, the major findings indicate that the Bookmark method showed classification consistency that was not high but relatively higher than that of the Ebel and modified Angoff methods and that there were statistically significant differences in agreement of judgment of subjects' achievement levels only between the Bookmark method and the other two methods. Those findings led to the following conclusions and implications: The Ebel, modified Angoff, and Bookmark methods all have differences in the percentage of subjects by the achievement levels in all the achievement levels according to the test difficulty levels. There is a need for a range of items that are fit for the achievement criteria of the math curriculum and widely vary in difficulty when making questions for a regular exam in order to assess the achievement levels of learners according to the objectives of the Mathematics Achievement Evaluation System. Given the findings that the classification consistency of the Bookmark method was not high but higher than that of the other two standard setting methods and that there were statistically significant differences in the judgment consistency of subjects' achievement levels only between the Bookmark method and the other two, it seems that the Bookmark method can reduce the risk of judging the achievement levels of subjects wrongly compared with the other two methods. Those findings point to a need to consider the application of the Bookmark method as a standard setting method in the Mathematics Achievement Evaluation System. Finally, it is required to develop mixed testing tools including items that fit the achievement criteria of the math curriculum and widely vary in the difficulty range and conduct follow-up study with students broadly distributed on the achievement levels in order to enhance the applicability and generalization of those findings in the Mathematics Achievement Evaluation System. Once Achievement Evaluation System is started in mathematics curriculum, teachers should provide students with the information about their achievement criteria. And then, they should also give the students the resultant feedback on teaching and learning activity. The achievement levels following the methodical procedure is a must for this. Cut-scores having reliability and validity is needed to complete the achievement level settings. Therefore, it is also required to conduct ongoing researches on the standard setting method appropriate for the Mathematics Achievement Evaluation System.