View : 45 Download: 0
A case study for improving misclassification rates using compact logistic regression
- A case study for improving misclassification rates using compact logistic regression
- Issue Date
- 대학원 통계학과
- model selection; logistic regression; data mining
- 梨花女子大學校 大學院
- Data mining is getting popular these days and is used in many applied fields. Data miner is looking for something that is not intuitive. The further away the information is from being obvious, potentially the more value it has. The new information must be valid. If data miners look hard enough in a large collection of data, they are bound to find something of interest, but it must be legitimate and correct. If the process is over-optimized (meaning the results actually moved beyond desired accuracy) or if the results are coincidental (meaning the results found just occurred by chance), this should be revealed in output analysis after the process has completed. In most actual samples, values are commonly missed. In consequence, it is difficult to select the best model and suitable treatment is needed to avoid such case at the first stage. Main focus on this thesis is how to handle missing values on some variables in order to develop a compact model with good predictability.
First we discuss efficient imputation methods. Then, we discuss how to select a compact logistic regression model based on a data mart with incomplete observations. We construct two data sets, whole data set with missing values being imputed by class averages and a sub data set with non-missing observations, and then develop reasonable logistic regression models based on each data set. Based on fit statistics and lift chart, we recommend best model out of four candidate models. Our strategy is illustrated through a case study.
- Show the fulltext
- Appears in Collections:
- 일반대학원 > 통계학과 > Theses_Master
- Files in This Item:
There are no files associated with this item.
- RIS (EndNote)
- XLS (Excel)
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.