DSpace at EWHA: Fused Reduction on Multidimensional Response Regression

Browse

My Repository

DSpace at EWHA일반대학원 통계학과 Theses_Master

View : 751 Download: 0

Fused Reduction on Multidimensional Response Regression

Title: Fused Reduction on Multidimensional Response Regression

Authors: 최유리

Issue Date: 2019

Department/Major: 대학원 통계학과

Publisher: 이화여자대학교 대학원

Degree: Master

Advisors: 유재근

Abstract: In this thesis, partial least squares to fuse unsupervised learning on multivariate dimension, called multidimensional fused clustered least squares is proposed. Supervised learning such as ordinary least square (OLS) and partial least squares (PLS) can be one possible method for replacing the original p-dimensional predictors with lower dimensional linearly transformed predictors for large p small n data, but its sample covariances may not exist. Alternatively, k-means clustering, a kind of unsupervised learning, is considered. Within each cluster, the covariance of the response and the predictors is computed and projected onto the covariance matrix of the predictors. This method is called clustered least squares (CLS). Then we fuse all of them from the various number of clusters. The fused clustered least squares (FCLS) combined these two method, supervised learning and unsupervised learning, for an efficient dimension reduction. We conduct the FCLS not only with the p-dimensional predictors but with the r-dimensional responses. As the dimension of the response variable is also expanded, the FCLS could not be applied well compared with the previous research. There would be an numerical error occurred with complicated successive calculations. Also, it takes a long time to conduct several computation. An additional process using different method is used to supplement these limitations. We compare FCLS with other methods by increasing the proportion of computed covariance from 60% to 99% so that we can find out potential advantages of FCLS on multidimensional space. In numerical studies, FCLS shows quite good performances compare with other methods on multidimensional space. The more proportion of computed covariance is used, the better performance FCLS shows. ;본 논문에서는 다차원 X와 Y에서 부분최소제곱법(partial least squares)과 군집분석을 결합한 방법론을 제시하고자 한다. 관측치 수에 비해 설명변수의 수가 상대적으로 많은 데이터(large p-small n)에서, 다차원 설명변수를 가능한 정보의 손실없이 관측치보다 작은 차원을 갖는 변형된 선형 설명변수로 대체하는 데 기존의 회귀분석(ordinary least squares)과 부분최소제곱법(partial least squares) 등이 사용가능한 방법론으로 고려되었지만 공분산이 존재하지 않을 수 있는 문제가 있었다. 이를 보완하기 위해 반응변수 Y를 이용하지 않는 군집분석(clustering)을 결합한 Fused Clustered Least Squares(FCLS) 방법론이 제시된 바 있다. 기존의 p개의 다차원 설명변수 X를 정보 손실 없이 효과적으로 줄이기 위해 고안된 FCLS 방법론을 다차원 Y의 확장된 데이터에 적용할 때 어떤 효과가 있는지 확인하고자 한다. 먼저 FCLS 방법론을 이용하여 1차적으로 차원을 축소하고, 특이값 분해(Sigular Value Decomposition)를 이용하여 2차 축소한다. 그리고 모의실험에서 최종적으로 축소된 공분산의 사용량을 60%부터 99%까지 늘려가며 사용한 정보량에 따라 각 방법론별 차원축소 효과를 비교한다.