DSpace at EWHA: 임상수행시험의 준거설정을 위한 Work Classification 방법의 의의

Browse

My Repository

DSpace at EWHA일반대학원 의학과 Theses_Ph.D

View : 729 Download: 0

임상수행시험의 준거설정을 위한 Work Classification 방법의 의의

Title: 임상수행시험의 준거설정을 위한 Work Classification 방법의 의의

Other Titles: Work Classification Method for Standard-setting on Clinical Performance Examination

Authors: 이명진

Issue Date: 2011

Department/Major: 대학원 의학과

Publisher: 이화여자대학교 대학원

Degree: Doctor

Advisors: 한재진

Abstract: 최근 표준화환자를 활용한 임상수행평가를 의사면허나 의과대학 졸업 자격을 부여하기 위한 총합평가의 일환으로 도입하는 경우가 점차 많아지고 있다. 우리나라에서는 2010년 의사국가시험부터 기존에 시행하던 필기시험과 함께 응시자의 임상수행능력을 평가하기 위한 실기시험을 도입하여 시행하고 있다. 이와 같은 평가를 통해 의과대학을 졸업하는 응시자가 의사로서 일차진료를 수행하기 위해 필요한 능력을 갖추었는가에 대한 합리적인 판단을 하기 위해서는 그 근거로 신뢰할 수 있고, 타당한 준거를 설정하는 것이 반드시 필요하다. 의학교육의 임상수행평가에 대한 바람직한 준거를 설정하는 것은 어렵고 복잡할 수 있다. 최근에는 필기시험과 수행평가에 모두 적용 가능한 여러 가지 준거설정 방법 중 응시자의 수행을 고려하는 방법을 도입하여 그 결과로 방어력 있고, 재현 가능한 준거를 설정한 결과를 보고하는 연구가 늘어나고 있다. Work classification 방법은 전문가인 준거설정자들이 응시자 표본의 수행에 대해 판단하는 방법으로 교육학 및 의학 분야에서 적용된바 있다. 본 연구에서는 work classification 방법을 서울-경기지역 CPX 컨소시엄에서 시행한 임상수행시험 자료에 적용하여 준거점수를 설정해 보고, 이에 대한 의의를 평가하였다. Work classification 방법의 적용가능성, 준거설정의 신뢰도, 준거점수의 현실성 측면에서 검토해 보았고, 반복평정의 효과, 준거설정자의 행동양상, 준거설정 과정의 실행가능성에 대한 연구도 진행하였다. 20명의 준거설정자들이 하루 동안의 준거설정 작업에 참여하였다. 준거설정자들을 2조로 나누어 공통 증례 2개와 조별 증례 5개에 대한 준거설정을 진행하였다. 준거설정자들에게 전체점수범위에서 고르게 선택한 50개 채점표에 대해 합격·불합격 판정을 내리게 하였다. 채점표에 대한 준거설정자들의 합격판정률과 채점표 점수에 대해 회귀식과 결정계수(R2)를 산출하였다. 회귀식으로부터 50%의 합격판정률에 해당하는 채점표 점수를 해당 증례의 합격점수로 정했다. 증례별로 적합된 회귀식은 2차식과 1차식이 채택된 경우가 많았고 3차식은 1증례에서 채택되었다. 결정계수는 0.55 이상이었고 이는 채점표 평균 점수 변화의 55%이상은 준거설정자들의 합격판정률에 의해 설명할 수 있는 것을 의미한다. 합격점수는 44.9 ~ 56.3점이었고, 전반적으로 증례의 난이도가 높을수록 합격점수는 낮은 경향을 보였다. 합격률은 79.8 ~ 94.0%로 대체로 85%이상이었으나 두 증례에서 78.9%와 79.8%를 보여 원인에 대한 분석이 필요할 것으로 보였다. 일반화가능도계수는 A조와 B조에서 각각 0.88과 0.89로 양호한 신뢰도 결과를 보였다. 또한 준거설정자의 인원을 6명으로 줄여도 양호한 일반화가능도계수가 산출되는 것으로 예측되었다. 한 증례에 대한 준거설정자들의 평정에 소요되는 시간은 최대 1시간을 넘지 않았다. 본 연구를 통해 채점표에 반영된 학생들의 임상수행능력에 대한 전문가의 판단으로 임상수행시험의 증례별 합격점수를 결정해 보았으며, 이 방법은 평가의 목적에 부합할 뿐만 아니라 신뢰도가 양호하고, 실행가능성이 높음을 확인할 수 있었다.;Based on concerns related to physician competency and patient safety, various methods of assessment evaluating clinical performance have been incorporated into medical education and even high-stakes licensure examination with growing frequency. Therefore we needed robust methods to justify and defense the passing scores for the assessments. Setting standards for complex performance assessments is difficult. A number of standard-setting approaches are currently available for both written and performance tests. Methods that focus on review of examinee work have been employed more frequently, often resulting in the establishment of defensible, reproducible standards. Work classification methods establish the standards based on judges' classifications of samples of examinee work and have been used to set passing scores for performance assessments in both educational and medical context. The purpose of the present study was to describe and evaluate a work classification method for setting performance standard for clinical skills examination. Twenty judges set standards for 12 year-4 undergraduate clinical performance examination (CPX) cases. To ensure that the standard was reasonably reproducible, multiple judges, divided into two panels, were employed. For each of 12 cases, judges reviewed 50 case checklists and determined whether the performance reviewed was reflective of someone who has achieved the minimal competence level required to practice medicine as a primary care physician (pass, fail). For each checklist, the percentage of "pass" decisions was calculated. Regression analyses for three models (linear, quadratic, and cubic) were performed for each case. The credibility of the standard setting procedures was studied in terms of three aspects; the applicability, reliability of the procedure, and the extent to which the standard was deemed realistic. Furthermore, we investigated the impact of the iterative process, judges' behavior, and the practicality of the process. We selected different regression models for each case based on the result of statistical significance. We chose quadratic and linear regression model for most of the 12 cases. Model-data fit(R2) values, which means the percentage of variance in checklist scores accounted for by panelists' ratings, were above .55. Iterative steps in training procedure led to a reduction in the variance between the judges’ ratings. The final passing scores for twelve cases ranged from 44.9 to 56.3 and the expected pass rates ranged from 79.8% to 94.0%. In general, there was a tendency to select lower passing scores for cases with higher difficulty levels. In two of twelve cases, the pass rates were expected to be less than 80% and we thought that more attempt to analyse the cause of the unexpected results will be necessary. Consistency among panelists was reasonable, and the reliabilities (expressed as generalizability coefficient) were 0.88 and 0.89, respectively. Such high generalizability coefficients indicate that the work classification method can be utilized. The maximal time commitment for each judge was less than 1.0 work hour for rating one case. This model for setting performance standards successfully set useful standards for the examination, was feasible and practical, and can be utilized to set performance standards for other standardized patient examinations.