DSpace at EWHA: 다중회귀와 회귀나무를 활용한 최적소득예측분포 연구

Browse

My Repository

DSpace at EWHA일반대학원 통계학과 Theses_Master

View : 622 Download: 0

다중회귀와 회귀나무를 활용한 최적소득예측분포 연구

Title: 다중회귀와 회귀나무를 활용한 최적소득예측분포 연구

Authors: 박예정

Issue Date: 2008

Department/Major: 대학원 통계학과

Publisher: 이화여자대학교 대학원

Degree: Master

Abstract: In this paper, we fitted the models to predict personal income of Korean using multiple regression analysis and classification and regression trees(CART). To fit the prediction models we used Korea Labor and Income Panel Study Data(2005). As this data includes copious information about the respondents, we cleaned the data to contain no-handicapped working respondents. A model from multiple regression analysis includes 29 variables consists of additive terms, transformed terms(square and square root), and interactions. We interpreted the additive variables by its coefficient and used Coplot(Conditioning Plot) to interpret interaction effect. A model from CART includes 8 variables and has 12 terminal nodes. We interpreted the result of CART by dividing respondents into 3 groups -High, Middle, and Low income. From the models, we found out that the personal Korean income is mainly affected by 'Jobclass', 'Gender', 'square of Age', 'Satisfaction on current work', 'Social and Economical Status', 'Location of work place', 'Regularity of working hours', 'Working hours', 'square root of Working hours', 'Marriage', 'Academic background', and several interaction terms - 'Gender' and 'square of Age', 'Jobclass' and 'square of Age', etc. Though both model gives similar interpretation on personal Korean income, a multiple regression model performs better in the prediction.;본 논문에서는 한국노동패널 8차년도(2005년) 자료를 이용하여 소득분포에 대하여 연구하였다. 소득분포를 추정하는 데 육체적 제약과 감각기관 장애가 없는 취업자들로 구성된 자료를 사용하였고, 다중회귀와 회귀나무 방법을 활용하였다. 다중회귀 분석결과 최종 모형에 포함된 변수는 ‘일자리유형’, ‘성별’, ‘만나이’, ‘만나이의 제곱’, ‘전반적 직무만족도 관련문항 2개’, 자격증 보유여부’, ‘생활만족도 관련문항 5개’, ‘흡연여부’, ‘음주여부’, ‘사회경제적 지위’, ‘14세 무렵 경제적 형편’, ‘키’, ‘취업시기’, ‘취업시기의 제곱’, ‘사업체 위치’, ‘업종’, ‘일자리 유형’, ‘근로시간 규칙성’, ‘근로시간’, ‘근로시간의 제곱근’, ‘혼인여부’, ‘성별’*‘만나이’, ‘성별’*‘만나이의 제곱’, ‘성별’*‘일자리 유형’, ‘성별’*‘흡연여부’, ‘만나이’*‘일자리유형’, ‘만나이의 제곱’*‘일자리유형’, ‘만나이’*‘자격증 보유여부’, ‘만나이’*‘취업시기’이다. 본 논문에서는 교호작용효과가 있는 변수들은 Coplot (Conditioning Plot)을 그려 해석하였고, 교호작용효과에 포함되지 않는 항은 회귀계수를 가지고 해석하였다. 회귀나무분석 결과, 모형에 포함된 변수는 ‘일자리유형’, ‘성별’, ‘만나이’, ‘학력’, ‘근로시간’, ‘취업시기’, ‘사회·경제적 지위’, ‘생활만족도 관련문항 1개’이고, 이 변수들은 소득을 3가지 집단(‘상’, ‘중’, ‘하’)으로 나누어 해석하였다. 마지막으로 두 가지 방법으로 추정된 소득분포의 예측력을 비교하였는데, 예측력은 다중회귀모형이 더 좋은 것으로 나타났다.