DSpace at EWHA: Simulation Studies for Comparison of Confidence Intervals in One-sample Correlated Data

Browse

My Repository

DSpace at EWHA일반대학원 통계학과 Theses_Master

View : 510 Download: 0

Simulation Studies for Comparison of Confidence Intervals in One-sample Correlated Data

Title: Simulation Studies for Comparison of Confidence Intervals in One-sample Correlated Data

Authors: 주이진

Issue Date: 2008

Department/Major: 대학원 통계학과

Publisher: 이화여자대학교 대학원

Degree: Master

Abstract: In biomedical research, one sample correlated data are frequently observed and we have been interested on constructing confidence intervals for the proportion of interest as asymptotic ones. However, in the small sample which is also commonly happened in the biomedical field, the performance of asymptotic confidence intervals was not reliable. To improve this situation, four new confidence intervals, of which the skewness is corrected by the Edgeworth Expansion of the studentized test statistic, were proposed by Kang, Lee and Lesaffre. In this paper, I focus on comparing these new confidence intervals to existing ones with simulation study under the various distribution assumptions of one-sample correlated data. Generally, in the correlated binary data, the performance of confidence interval which assumes the independence of data under the large sample assumption would decrease, because groups of data are correlated with each other even though each observation is independent, especially in the small sample. To make up for this weakness, Fleiss(1979) replaced to , Hall(1992) suggested improved errors incorporated the third and forth moment based on the Edgeworth Expansion and are applied to three new intervals. This simulation study is conducted on the small sample data to test eight confidence intervals, of which four have existed and four are proposed in the advance paper to compare how much performance has been improved. Also, I have tested confidence intervals in the mixture binary distribution, the autoregressive correlation data including correlated binary data as an original concern. As a measure of performance, coverage probability and average length were adopted for comparing intervals, as the previous paper did. Coverage probability was obtained from 50,000 confidence intervals through generating data with 50,000 cycles. This probability can be calculated as a portion of events which simulated confidence intervals cover the true value of the proportion of the interest to a total of 50,000. As the coverage rate reaches closer to 95 percent, the performance of the confidence interval is considered better. Also, the average length is calculated from the average length of 50,000 simulated confidence interval samples, and the ones with shorter length would be considered better. Based on the result of simulation, mostly newly proposed confidence intervals show better performance than existing ones. The existing one which calculated by solving the quadratic equation in produces the best results in any cases. However, the two confidence intervals adopting indirectly corrected error suggested by Hall, show more stable results to others in the case assuming higher skewness and dependency in the correlated observations, mixture of distribution or time dependency, which are more common in the real world than in unbiased independent cases.;생물 통계 분야에서는 여러 개의 군집이 존재하고, 이 군집 내에서 이항 분포를 따르는 데이터가 흔히 관찰되며, “Correlated Binary Data”라는 이름으로 불린다. 이 데이터에서는 각각의 관측이 독립이라 하더라도 군집 간의 종속성 때문에 데이터들이 상관성을 띄게 되어, 독립성을 가정하는 일반적인 신뢰구간을 적용 시 그 정확성이 급격히 떨어진다. 이러한 단점을 극복하기 위한 기존 개의 신뢰구간이 존재했고, 2007년에는 Kang, Lee and Lesaffre가 Edgeworth Expansion의 개념을 이용한 세 종류의 신뢰구간을 새로 제시하였다. 본 논문에서는 이 세 신뢰구간이 군집 간의 종속성을 가정한 Correlated Binary Data 뿐 아니라, 분포간의 중첩 효과를 가정한 Mixture Binomial Data 및 군집 내 데이터 간의 연관성을 가정한 Auto-Regressive Data에 대한 Simulation을 통해 각 신뢰구간의 우수성을 비교해 보았다. 신뢰구간의 우수성을 비교하는 측도로 위의 논문에서 사용된 Coverage Probability와 Average Length를 사용하였다. Coverage Probability는 50,000 번의 데이터 생성에서 얻어지는 50,000개 신뢰구간 중 모수의 참값을 포함하는 비율을 계산하여 95%의 신뢰수준에 가까운지 확인하는 방법이다. Average Length는 Sample 신뢰구간의 길이의 평균으로, 이것이 짧은 신뢰구간일수록 더 우수한 것으로 간주한다. 각 Simulation에서, 기존 신뢰구간에 비해 새로 제시된 신뢰구간들이 대체로 좋은 성능을 보이는 것을 확인할 수 있었다. 그러나 많은 경우, 가장 좋은 결과를 보이는 것은 p 에 대한 quadratic equation을 풀어서 구한 신뢰구간이다. 하지만 새로운 신뢰구간은 같은 분포 내에서도 더 심한 비대칭과 비독립성이 가정된 경우에 특히 안정적인 결과를 보였다. 이러한 결과에 근거하여, 새로 제시된 세 가지 신뢰구간, 특히 Hall에 의해 제안된 간접적 방식으로 개선된 오차항을 포함한 두 가지 신뢰 구간의 경우, 현실에서 빈번하게 발생하는 관측치의 상관성, 분포 간 중첩, 시간에 따른 종속성이 존재하는 경우에 우수하게 적용될 수 있음을 증명할 수 있게 되었다.