다차원 검사의 점수화 방법 간 측정의 정확성 비교

Other Titles
A Comparison of estimate accuracy among weighting methods for scoring multidimensional test
Issue Date
대학원 교육학과
이화여자대학교 대학원
이 연구의 목적은 일차원 검사와 구별되는 특징을 지니는 다차원 검사에 대해 검사의 다차원성을 반영할 수 있는 점수화 방법을 제안하고, 제안된 점수화 방법이 다양한 검사 조건에서 얼마나 일관되고 신뢰로운 검사점수를 산출하는지 알아보는 것이다. 이를 위해 다차원 검사의 점수화 방법을 네 가지로 제안하고, 제안된 네 가지 점수화 방법들을 모의자료에 적용하여 산출된 검사점수 각각의 신뢰도와 측정의 표준오차, 진점수와의 상관을 비교하였다. 연구결과 첫째, 신뢰도 계수의 측면에서는 고전검사이론의 문항변별도를 문항가중치로 하는 방법이 다양한 검사 조건에서 높은 신뢰도를 확보하는 검사점수를 제공하는 것으로 분석되었다. 둘째, 측정의 표준오차를 고려한다면 다차원 문항반응이론의 문항변별도를 문항가중치로 부여하는 방법에서 일관성 있는 추정치를 얻을 수 있는 것으로 밝혀졌다. 이러한 결과는 모든 검사 조건에서 동일하게 나타났다. 셋째, 진점수와의 상관분석 결과, 다차원 문항반응이론의 문항변별도를 문항가중치로 부여하는 방법이 진점수에 가장 가까운 검사점수를 산출하는 것으로 나타났다.;From the age that children learn how to read and write, paper and pencil test plays an important role in their lives. Test scores provide valuable information to students, parents, teachers, administrators, and test practitioners. As such, testing practitioners have the responsibility to establish an appropriate scoring method for a given test and subsequently provide an interpretation for their users. Most cognitive achievement tests measures, to different degrees, multiple skills or traits. If a test is measuring multiple skills or traits, test score needs to be calculated reflecting its own feature. A test score is a composite of item scores. Therefore, test score is under the influence of a item weighting method. Besides, item weighting methods can provide a solution to reflect the characteristics of a test. In the field of educational measurement, in an attempt to better estimate an examinee's ability on a specific trait, different weighting methods have been developed to derive test scores. The purposes of this study are to suggest weighting methods for a multidimensional test and to compare proposed methods in the side of the reliability, standard error of measurement(SEM), and the correlation coefficient between true score and the each test score estimated by proposed weighting methods. Four weighting methods were proposed by a literature review: (1) the number-correct scoring method that all items are weighted one(CTT), (2) the weighted number-correct scoring method using item discrimination in classical test theory(WCTT), (3) the weighted number-correct scoring method using item discrimination in multidimensional item response theory(MIRT), (4) the weighted number-correct scoring method using item discrimination in unidimensional item response theory(UIRT). For the comparison among weighting methods, simulation data of 1,000 examinees were used. Using RESGEN 4 program, response data and ability parameters() were obtained. On the assumption that the number of dimension(2, 3), test length(24, 36, 48), and the degree of correlation among dimensions(0.3, 0.5, 0.7) can influence the result of the comparison, these three factors were considered to construct simulation data. As a result, simulation data were generated in 16 test conditions. The procedure was repeated 100 times for each of the four tests, and for each repetition the seed number was changed. The result of this study are summarized as follows: First, in analysis of reliability, WCTT turned out the highest in most test conditions. Second, in analysis of SEM, the lowest estimate was found in MIRT method, which means MIRT provides the most consistent test score. This result was same on every condition. Third, in analysis of correlation coefficient between true score and each test score, MIRT was also found the highest on various test conditions except when the correlation among every dimension is high(0.7). This result supports the study of Rotou et al. (2001). In conclusion, this study suggests that MIRT method can provide useful information on various test conditions. The reliability of MIRT was not the highest, but was acceptable level. The result of comparisons in SEMs and correlation between the ability parameters also showed that test scores from MIRT method are robust and close to true score. However, WCTT method can be recommended as an alternative to MIRT. This is why WCTT method provides not only the highest reliability, but as sound SEMs as MIRT method which turned out the most consistent test score. Besides, correlation coefficients between WCTT and MIRT were very high. If it is considered that WCTT method is more accessible to teachers who would feel MIRT method abstruse, WCTT method can be used for multidimensional test scoring.
