DSpace at EWHA: 등급반응모형, 평정척도모형, 부분점수모형의 문항모수와 피험자모수 추정치 비교분석

Browse

My Repository

DSpace at EWHA일반대학원 교육학과 Theses_Master

View : 1879 Download: 0

등급반응모형, 평정척도모형, 부분점수모형의 문항모수와 피험자모수 추정치 비교분석

Title: 등급반응모형, 평정척도모형, 부분점수모형의 문항모수와 피험자모수 추정치 비교분석

Authors: 임미경

Issue Date: 2001

Department/Major: 대학원 교육학과

Publisher: 이화여자대학교 대학원

Degree: Master

Abstract: The purpose of this study is to investigate similarity of item parameters and person's trait(parameters) among five different polytomouse IRT models. The five models are Samejima's(1969) Graded response model(GRM), Muraki's(1990) Rating scale model(MRSM), Master's(1982) Partial credit model(PCM), Andrich's(1978) rating scale model(ARSM) and Muraki's(1992) Generalized partial credit model(GPCM). Data set is a sample from the Prefered Task Difficulty Scale(N=1739), which is a subscale of Academic Failure Tolerance Scale constructed by Kim(1994) for Korean students. The Prefered Task Difficulty Scale has 8 items and each item has six item response categories. The respondents mark rating scale categories labeled 'disagree strongly' , 'disagree' , 'disagree a little' , 'agree a little' , 'agree' 'agree strongly.' MULTILOG computer program was used to estimate parameters of GRM. PARSCALE computer program was used for MRSM and GPCM. BIGSTEPS computer program was used for PCM and ARSM. Item Parameters(item discrimination and five step values associated with categories) and person parameters are analyzed. Each parameter is compared across the models with correlation coefficient and F-ratio of repeated design ANOVA. The results are as follows. Among the group of models which estimate item discrimination, most of category parameter estimates are highly correlated each other between MRSM and GPCM . However, as a result of F test, only the first category parameter estimates were not significantly different among GRM, MRSM and GPCM. For a misfitting item, 8, estimated item parameters among models were very diverged. The equal distance of five item category parameters were sustained well in the models of GRM, MRSM, ARSM and GPCM, not in PCM. Based on the comparisons of item parameter estimation, PCM may result in different conclusion with other four models to analyze Likert scale responses. Correlation coefficients among estimated person parameter of the five models ranged from .965 to 1.000. The Spearman's rank correlation coefficients ranged from .984 to 1.000 were higher than the correlation coefficients. Even though the estimated person parameters of five models are not equal numerically, these five models rank examinees in same order. Based solely on the comparisons of person parameter estimation, there appears to be no clear advantage in selecting one model over the other.;평가양식이 변화함에 따라 서답형 문항이나 수행과제에 대한 단계적 채점이 증가하고 있다. 다분문항반응이론은 피험자의 세부적인 중간단계 능력을 더 잘 반영할 수 있는 검사이론으로서 가치가 높다. 또한 정의적 영역을 측정하는 심리검사에서 여러개의 응답범주 분석에 문항반응이론을 적용할 수 있는 방법이기도 하다. 본 연구의 목적은 동일한 실제 검사 자료에 다섯 가지 다분문항반응모형을 적용하여 각 모형들이 문항모수와 피험자 모수를 동일하게 추정하는지 확인하는 데 있다. 분석에 사용된 자료는 학구적 실패 내성검사 중에서 과제수준선호척도의 8개 문항에 대한 전국의 초·중 ·고등학생 1,739명의 응답이다. 응답척도는 Likert 6단계 척도로 '매우 반대'에서 '매우 찬성'까지이다. 이러한 응답척도 분석에 적용 가능한 다분문항반응모형으로 5개를 선정했으며, 다섯 개 모형은 등급반응모형, Muraki의 평정척도모형, 부분점수모형, Andrich의 평정척도모형, 일반부분접수모형이다. 등급반응모형의 모수추정은 MULTILOG 프로그램을 사용하였고, Muraki의 평정척도모형과 일반부분점수모형은 PARSCALE 프로그램, 부분점수모형과 Andrich의 평정척도모형은 BIGSTEPS 프로그램을 사용하였다. 추정된 문항모수와 피험자 모수는 모형별 추정치간 상관계수와 반복설계 분산분석을 통하여 비교하였다. 비교의 대상인 종속변수는 문항모수로서 변별도와 5개 문항범주난이도, 피험자 모수이다. 연구를 통해 얻은 결론은 다음과 같다. 변별도를 고려하는 등급반응모형, Muraki의 평정척도모형, 일반부분점수모형의 문항모수 추정치는 제1범주난이도에서 모형간 유의한 차이가 없었으며 변별도와 제2범주난이도, 제3범주난이도, 제4범주난이도, 제5범주난이도는 모형간 차이가 있었다. 그러나 모형간 추정치의 상관은 세 개 모형에서 매우 높았으며, 특히 Muraki의 평정척도모형과 일반부분점수모형의 상관이 높았다. 등급반응모형 Muraki의 평정척도모형의 변별도는 모형간 매우 유사하였고, 모든 문항에서 일반부분점수모형의 변별도가 이들 두 모형보다 다소 낮게 추정되었다. 검사척도의 양호성을 입증할 문항범주난이도의 동간성은 부분접수모형에서 잘 지켜지지 않았으며 등급반응모형, Muraki의 평정척도모형, Andrich의 평정척도모형, 일반부분점수모형에서는 비교적 잘 유지되었다. 그러므로 Likert 척도의 검사분석에 부분점수모형은 문항모수 해석에서 다른 모형들과 차이를 가져올 수 있다. 다섯 가지 모형에 대한 피험자 모수 추정 결과는 유의수준 .01에서 통계적으로 유의한 차이가 있었으나 추정치 자체의 차이는 있더라도 각 모형별 추정치를 통해 피험자들에게 순위를 매긴 결과는 대체로 등위상관계수가 .98 이상으로 높아서 피험자 모수 추정의 결과는 모형간 매우 유사하다고 판단할 수 있다. 본 연구에서는 각 모형을 비교함에 있어 프로그램의 효과와 모수 추정방법의 효과가 완전히 통제되지 못했다는 제한점을 가지고 있다. 이러한 제한점을 보완하는 연구가 필요하며, 모형간 비교 결과를 일반화할 수 있도록 다양한 조건에서 모의자료를 대상으로 모형비교의 연구가 수행되어야 함을 제언한다.