任勇彬
盧鉉京
盧鉉京
2016-08-25T01:08:03Z
2016-08-25T01:08:03Z
2005
OAK-000000010493
http://dspace.ewha.ac.kr/handle/2015.oak/172298
http://dcollection.ewha.ac.kr/jsp/common/DcLoOrgPer.jsp?sItemId=000000010493
Data mining is getting popular these days and is used in many applied fields. Data miner is looking for something that is not intuitive. The further away the information is from being obvious, potentially the more value it has. The new information must be valid. If data miners look hard enough in a large collection of data, they are bound to find something of interest, but it must be legitimate and correct. If the process is over-optimized (meaning the results actually moved beyond desired accuracy) or if the results are coincidental (meaning the results found just occurred by chance), this should be revealed in output analysis after the process has completed. In most actual samples, values are commonly missed. In consequence, it is difficult to select the best model and suitable treatment is needed to avoid such case at the first stage. Main focus on this thesis is how to handle missing values on some variables in order to develop a compact model with good predictability.
First we discuss efficient imputation methods. Then, we discuss how to select a compact logistic regression model based on a data mart with incomplete observations. We construct two data sets, whole data set with missing values being imputed by class averages and a sub data set with non-missing observations, and then develop reasonable logistic regression models based on each data set. Based on fit statistics and lift chart, we recommend best model out of four candidate models. Our strategy is illustrated through a case study.
TABLE OF CONTENTS
Abstract = 5
1. Introduction = 6
2. Literature Review = 7
3. Case Study = 11
3.1. Prearrangement along Basics of Data Mining = 11
3.2. Data Mining Diagrams = 12
3.3. Statistical Results for Two Modified Data Sets = 19
3.4. Application Trained Logistic Regression = 27
3.5. Summary = 30
4. Concluding Remarks = 31
REFERENCES = 32
APPENDIX = 33
감사의 글 = 43
application/pdf
849604 bytes
eng
梨花女子大學校 大學院
model selection
logistic regression
data mining
A case study for improving misclassification rates using compact logistic regression
Master's Thesis
43 p.
Master
대학원 통계학과
2005. 8