DSpace at EWHA: Data Mining 기법을 이용한 자동차보험 손해율 분석

Browse

My Repository

DSpace at EWHA일반대학원 통계학과 Theses_Master

View : 740 Download: 0

Full metadata record

DC Field	Value	Language
dc.contributor.advisor	송종우	-
dc.contributor.author	정현희	-
dc.creator	정현희	-
dc.date.accessioned	2016-08-25T11:08:21Z	-
dc.date.available	2016-08-25T11:08:21Z	-
dc.date.issued	2011	-
dc.identifier.other	OAK-000000067886	-
dc.identifier.uri	https://dspace.ewha.ac.kr/handle/2015.oak/186481	-
dc.identifier.uri	http://dcollection.ewha.ac.kr/jsp/common/DcLoOrgPer.jsp?sItemId=000000067886	-
dc.description.abstract	data mining은 관련되어 있는 분야가 매우 다양하고 많은 분석방법들을 포함하고 있다. 전자적인 형태로 저장 되어온 지가 수십년이 흘러 현재에서 축적되어 있는 data는 매우 방대하다. 또한 대부분의 조직에서는 이 같은 대규모 데이터베이스를 보유하고 있는 것이 사실이다. 보유하고 있는 데이터베이스를 어떻게 가공하여 하는지에 따라서 기업의 방향과 성과에 밀접한 영향을 끼친다. 그러나 데이터의 무제한적인 증가는 우리가 원하는 정보를 찾아내는 일을 보다 어렵게 만들고 있는 것이 현실이다. 이러한 관점에서, data mining은 대용량의 데이터로부터 이들 데이터 내에 존재하는 관계, 패턴, 규칙 등을 탐색하고 찾아내어 모형화 함으로써 유용한 지식을 추출하는 일련의 과정들이라고 정의할 수 있다. 본 논문에서는 많은 data mining의 기법들 중에서 일반적인 single tree와 Random Forest, Gradient Boosting Method를 소개할 것이다. 또한 실제 자동차보험 데이터를 이용하여 이 세 가지 방법으로 손해율을 추정해보고 각 방법들의 장점과 단점, 성능을 비교해보고자 한다. 자동차보험회사를 영위하는데 중요한 지표 중 하나인 손해율의 오차를 극소화시키는 예측 손해율을 통해 결과에 따라 기업이 어떤 action을 취해야 하는지 방향성을 잡는 것에 기여하고자 한다.;It is important to estimate and analize the loss ratio in the insuarance field. There are a lot of data mining method. To analize the loss ratio for car insurance this paper introduces three data mining methods such as single tree, random forest, and gradient boosting method. This paper also estimate the loss ratio for car insurance using these three methods, and compare the strengths, weaknesses and performances among them. To compare these three methods this paper deal with real car insuarance data.	-
dc.description.tableofcontents	I. 서론 1 II. 일반이론 3 A. Decision Tree 3 1. Decision Tree의 형성과정 3 2. Regression Tree의 일반이론 4 가. Regression Tree의 이해 4 나. Regression Tree의 Stopping Rule 6 B. Random Forest 7 1. Bagging 7 2. Random Forest 9 가. 알고리즘 9 나. Random Forest 로 분석할 때 고려할 점 10 다. Random Forest의 장점 10 C. Gradient Boosting Method 11 1. Gradient Boosting의 일반이론 11 2. Size of Trees 13 3. Shrinkage 13 III. 실증분석 14 A. 분석 데이터 소개 14 B. 분석을 위한 데이터 준비 15 1. 분석에 사용된 설명변수 15 가. Raw Data 15 나. 변수변환과 파생변수 생성 16 다. 사용된 설명변수의 설명 17 라. 추가적인 변수설명 24 C. 분석 26 1. 손해율과 각 설명변수들간의 상관관계 26 2. 선형모형분석 26 3. Gradient Boosting Method 27 4. Random Forest 31 IV. 결론 32 참고문헌 33 부록.선형모형분석결과 34 ABSTRACT 37	-
dc.format	application/pdf	-
dc.format.extent	961776 bytes	-
dc.language	kor	-
dc.publisher	이화여자대학교 대학원	-
dc.title	Data Mining 기법을 이용한 자동차보험 손해율 분석	-
dc.type	Master's Thesis	-
dc.format.page	viii, 37 p.	-
dc.identifier.thesisdegree	Master	-
dc.identifier.major	대학원 통계학과	-
dc.date.awarded	2011. 8	-