DSpace at EWHA: Fused Dimension Reduction for Multivariate Regression

Browse

My Repository

DSpace at EWHA일반대학원 통계학과 Theses_Master

View : 655 Download: 0

Fused Dimension Reduction for Multivariate Regression

Title: Fused Dimension Reduction for Multivariate Regression

Authors: 조유영

Issue Date: 2021

Department/Major: 대학원 통계학과

Publisher: 이화여자대학교 대학원

Degree: Master

Advisors: 유재근

Abstract: Sufficient dimension reduction (SDR) in regression of Y given a set of predictors X tries to replace the original p-dimensional predictors X with a lower-dimensional linear projection predictor by using the kernel matrix. Sliced inverse regression (SIR) is one of the most popular SDR methods. It estimates the central subspace with the sample mean of X under the categorized level of the response Y. The categorization of responses is called slicing. A weakness of this methodology is its sensitivity to the number of slices and Fused sliced inverse regression (FSIR) overcomes the disadvantage of SIR by fusing the kernel matrices. We want to extend FSIR approach to multivariate regression, but we face a problem of exponentially increasing the number of slices in that situation. Hierarchical clustered fused sliced inverse regression, which clusters Y with hierarchical clustering and fuses the kernel matrices, is a method to avoid this problem. Pooled fused sliced inverse regression fuses the kernel matrices after slicing each univariate Yi’s. It is useful when clustering result is poor. Numerical studies are conducted to figure out whether our methodologies well-estimate the central subspace.;최근 컴퓨팅 기술이 발전함에 따라 방대한 양의 데이터에 대한 연산 및 모델링이 가능해졌다. 하지만 데이터의 차원이 증가하면 정보의 밀도가 감소하는 차원의 저주 문제가 생길 수 있고 노이즈가 포함되어 변수의 설명력이 떨어진다. 그 문제를 피하기 위한 통계적 방법으로 데이터의 차원을 줄이는 차원 축소가 있고 우리는 그중 한 분야인 충분차원축소(Sufficient dimension reduction; SDR)에 대해 자세히 살펴보도록 하겠다. SDR은 사영을 통해 p차원의 설명변수를 저차원의 선형변환 된 설명변수로 대체하는 방법이다. SDR의 대표적 방법인 Sliced inverse regression(SIR)는 반응변수를 범주화하는 슬라이싱(slicing)을 통해 범주화 된 Y의 수준에서 X 의 표본 평균을 이용하여 중심 부분 공간(central subspace)을 추정하는 것이다. 그러나 SIR는 슬라이스 수에 민감하다는 단점이 있고 다변량 Y에 적용하였을 때 차원의 저주 문제가 생길 수 있으므로 우리는 SIR에 비해 강건한 Fused sliced inverse regression(FSIR)을 고려한다. FSIR란 여러 슬라이스 수에서 얻어진 커널 매트릭스(kernel matrix)를 결합하는 방법이다. 그러나 다변량 Y에 각각의 Yi 마다 FSIR 를 바로 적용하게 되면 슬라이스 수가 기하급수적으로 증가한다. 이러한 문제를 해결하기 위해 계층적 군집화 한 Y를 통해 얻어진 커널 매트릭스를 결합하는 것을 Hierarchical clustered FSIR(HCFSIR)라고 한다. 또 하나의 방법으로는 Pooled FSIR(PFSIR)가 있다. 이는 각각의 Yi를 슬라이싱 한 후에 커널 매트릭스를 결합하는 방법이며 클러스터링의 결과가 좋지 않은 경우 유용하다.