View : 155 Download: 0

Full metadata record

DC Field Value Language
dc.description.abstractData mining is getting popular these days and is used in many applied fields. Data miner is looking for something that is not intuitive. The further away the information is from being obvious, potentially the more value it has. The new information must be valid. If data miners look hard enough in a large collection of data, they are bound to find something of interest, but it must be legitimate and correct. If the process is over-optimized (meaning the results actually moved beyond desired accuracy) or if the results are coincidental (meaning the results found just occurred by chance), this should be revealed in output analysis after the process has completed. In most actual samples, values are commonly missed. In consequence, it is difficult to select the best model and suitable treatment is needed to avoid such case at the first stage. Main focus on this thesis is how to handle missing values on some variables in order to develop a compact model with good predictability. First we discuss efficient imputation methods. Then, we discuss how to select a compact logistic regression model based on a data mart with incomplete observations. We construct two data sets, whole data set with missing values being imputed by class averages and a sub data set with non-missing observations, and then develop reasonable logistic regression models based on each data set. Based on fit statistics and lift chart, we recommend best model out of four candidate models. Our strategy is illustrated through a case study.-
dc.description.tableofcontentsTABLE OF CONTENTS Abstract = 5 1. Introduction = 6 2. Literature Review = 7 3. Case Study = 11 3.1. Prearrangement along Basics of Data Mining = 11 3.2. Data Mining Diagrams = 12 3.3. Statistical Results for Two Modified Data Sets = 19 3.4. Application Trained Logistic Regression = 27 3.5. Summary = 30 4. Concluding Remarks = 31 REFERENCES = 32 APPENDIX = 33 감사의 글 = 43-
dc.format.extent849604 bytes-
dc.publisher梨花女子大學校 大學院-
dc.subjectmodel selection-
dc.subjectlogistic regression-
dc.subjectdata mining-
dc.titleA case study for improving misclassification rates using compact logistic regression-
dc.typeMaster's Thesis-
dc.format.page43 p.-
dc.identifier.major대학원 통계학과- 8-
Appears in Collections:
일반대학원 > 통계학과 > Theses_Master
Files in This Item:
There are no files associated with this item.
RIS (EndNote)
XLS (Excel)


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.