DSpace at EWHA: Exact Chi-square Test for Equiprobable Multinomial Distribution

Browse

My Repository

DSpace at EWHA일반대학원 통계학과 Theses_Master

View : 1031 Download: 0

Exact Chi-square Test for Equiprobable Multinomial Distribution

Title: Exact Chi-square Test for Equiprobable Multinomial Distribution

Authors: 李瑞眞

Issue Date: 2004

Department/Major: 대학원 통계학과

Publisher: 이화여자대학교 대학원

Degree: Master

Advisors: 姜勝湖

Abstract: A fundamental problem in statistical inference is summarizing observed data in terms of a p-value. The p-value forms part of the theory of hypothesis testing and may be regarded as in index for judging whether to accept or reject the null hypothesis. A very small p-value is indicative of evidence against the null hypothesis, while a large p-value implies that the observed data are compatible with the null hypothesis. There is a long tradition of using the value 0.05 as the cut-off for rejection or acceptance of the null hypothesis. While this may appear arbitrary in some contexts, its almost universal adoption for testing scientific hypotheses has the merit of limiting the number of false-positive conclusions to at most 5%. At any rate, no matter what cut-off one choose, the p-value provides an important objective input for judging of the observed data are statistically significant. Therefore it is crucial that this number be computed accurately. P-value based on the large-sample assumption are known as asymptotic p-values, while p-values based on deriving the true distribution of the test statistic are termed exact p-values. While one would prefer to use exact p-values for scientific inference they often pose formidable computational problems and so, as a practical matter, asymptotic p-values are used in their place. For large and well-balanced data sets this makes very little difference since the exact and asymptotic p-values are very similar. But for small, sparse, unbalance, and heavily tied data, the exact and asymptotic p-values can be quite different and may lead to opposite conclusions concerning the hypothesis of interest.;p-value 값은 귀무가설을 기각하는데 대한 명백한 증거를 암시하거나 관측된 데이터가 귀무가설에 모순이 없음을 함축한다. 관측된 데이터가 통계적으로 유의한지를 판단하기 위한 중요한 단서를 제공하는 p-value가 정확하게 계산되어져야 하는 것은 중요한 일이다. Asymptotic p-value에서 알려진 것처럼 일반적인 p-value는 대표본을 가정하는 반면에 Exact p-value는 테스트 통계의 실제 분포에 근거한다. 대표본이며 데이터가 잘 조화된 표본 집합은 실제 분포와 가정된 분포의 차이를 매우 작게 만들기 때문에 Exact p-value와 Asymptotic p-value는 매우 비슷하다. 그러나 소표본이며 데이터의 빈도수가 적고, 불균형적이거나 꼬리 부분이 굵은 데이터는 Exact p-value와 Asymptotic p-value를 매우 큰 차이를 내게하여 흥미가 있는 가설을 고려하는 반대 결론을 유도할 수도 있으므로 이것은 중요한 관건이다. 즉, Asymptotic p-value가 중요한 계산적인 문제, 실제적인 문제의 오류를 범할 가능성이 종종 있기 때문에 정확한 과학적 추론을 위해서는 Exact p-value를 사용하는 것이 적절하다. Asymptotic p-value, Exact p-value, Monte Carlo estimate of p-value값을 직접 비교해본 결과 Exact test를 사용하지 않고 Asymptotic test를 사용했을 경우 서로 상반된 결과를 가져올 수 있기 때문에 가설검정에 있어 오류를 범할 가능성이 있다. 그러므로 관측된 데이터가 소표본인 경우에는 통계학적으로 유의한지를 판단하기 위한 중요한 근거를 제시하는 p-value를 사용할 때 Exact Chi-square test의 중요성이 크다고 할 수 있다.