DSpace at EWHA: Dimension Test in Fused Sliced Average Variance Estimation

Browse

My Repository

DSpace at EWHA일반대학원 통계학과 Theses_Master

View : 999 Download: 0

Dimension Test in Fused Sliced Average Variance Estimation

Title: Dimension Test in Fused Sliced Average Variance Estimation

Authors: 안효인

Issue Date: 2019

Department/Major: 대학원 통계학과

Publisher: 이화여자대학교 대학원

Degree: Master

Advisors: 유재근

Abstract: 충분 차원 축소(Sufficient Dimension Reduction; SDR)의 목표는 정보 손실 없이 자료의 차원을 축소하는 것이다. 충분차원축소에서 필요한 두 가지 주요 단계는 차원 검정(Dimension test)과 기저 추정(Basis estimation)이다. 충분차원축소에서 가장 잘 알려져 있는 고전적인 방법론은 Sliced Inverse Regression (SIR; Li, 1991)와 Sliced Average Variance Estimation (SAVE; Cook and Weisberg, 1991)이다. Y

X (X∈R^p,Y∈R^1)에 대한 부분공간을 추정할 때, SIR는 E(X│Y)를 이용하고 SAVE는 cov(X

Y)를 이용한다. Y가 범주형일 때는 Y의 각 범주 안에서 cov(X

Y)에 대한 표본 적률 추정을 할 수 있다. 그런데 Y가 수치형인 경우에는 ‘슬라이싱(slicing)’이라고 불리는 Y에 대한 범주화가 선행되어야 한다. 표본 추정을 한 이후에는 커널 행렬을 스펙트럼 분해하여 고유값을 얻고, 이를 차원 검정에 대한 검정 통계량을 구성하는 데에 사용한다. 일반적으로 d를 추정할 차원이라고 할 때 충분 차원 축소 방법론에서 차원 검정의 귀무가설은 H_0:d=m이고, m=0부터 시작해 1씩 증가시켜 가며 귀무가설이 더 이상 기각되지 않을 때까지 검정을 연속적으로 시행한다. 충분 차원 축소의 여러 방법론에 따라 대표본 검정(large sample test)은 달라질 수 있으며, SAVE의 경우 Shao 외 2명 (2007)은 차원 검정을 하기 위한 검정 통계량 및 분포를 유도한 바 있다. 그러나 SAVE에는 분할(slicing) 단계로부터 발생하는 치명적인 단점이 있는데, 분할 조각(slices)의 최적 개수가 정해져 있지 않다는 것이다. 이러한 문제를 완화하기 위하여 Cook과 Zhang (2014) 이 제안하였던 융합 접근법을 고려할 수 있다. 이 접근법을 SAVE에 적용하는 방법을 Fused sliced average variance estimation (FSAVE) 라 할 것이다. FSAVE에서는 융합된 커널 행렬을 이용하여 차원 검정 통계량을 구성하기 때문에, 차원에 대한 검정이 시행 가능하지 않다. 본 논문에서는 순환 검정(Permutation test)이 FSAVE에서 차원을 검정하기 위해 대안으로 사용될 수 있는 방법으로 제시한다. 순환 검정에서는 점근 분포에 대한 복잡한 유도 과정이 필요하지 않다는 이점이 있다. 수치적 연구를 통해 순환 검정을 이용하여 시행한 FSAVE가 SAVE보다 종종 성능이 좋다는 것을 확인하였으며, 특히 n이 비교적 작거나 p가 비교적 클 때 차이가 두드러졌다.;Sufficient Dimension Reduction (SDR) in regression aims to reduce the dimension of data without loss of information. Dimension test and basis estimation are two main phases in SDR. There are two classical and most popular methodologies in SDR: Sliced Inverse Regression (SIR; Li, 1991) and Sliced Average Variance Estimation (SAVE; Cook and Weisberg, 1991). SIR estimates the subspace of Y

X (X∈R^p,Y∈R^1) by constructing the first moment of X

Y, and SAVE does by constructing cov(X

Y) which is related to the second moment of X

Y. If Y is categorical, the sample moment estimator of cov(X

Y) can be estimated within each category of Y. If Y is numeric, categorization of Y is required, which is also called slicing. Once the sample estimator is constructed, a kernel matrix is decomposed. Using the eigenvalues obtained from the spectral decomposition of the kernel matrix, constructing the test statistics for the dimension test is followed under the null hypothesis of H_0:d=m, where d is a true dimension. In the general SDR context, the dimension test is sequentially conducted by incrementing m by 1 until the null is rejected for the first time. The large sample test statistic can differ regarding each methodology of SDR, and in SAVE, Shao et al. (2007) derived test statistics and their asymptotic distributions for the marginal dimension test. SAVE has, however, a critical defect at the step of slicing: there is no rule of thumb to find the optimal number of slices. To alleviate this problem, the fusing approach (Cook and Zhang, 2014) can be applied to SAVE, which will be called Fused sliced average variance estimation (FSAVE). In FSAVE, the marginal dimension test is not feasible because the test statistic is created based on a fused kernel matrix. In this thesis, the permutation test is presented as an alternative methodology to estimate the dimension in FSAVE, which does not require any complex derivation for asymptotic distribution. Numerical studies confirmed that FSAVE with the permutation test often performs better than SAVE, especially when n is relatively small or p is relatively large.