DSpace at EWHA: 신용카드 부정사용 방지 시스템위한 sample size 결정

Browse

My Repository

DSpace at EWHA일반대학원 통계학과 Theses_Master

View : 628 Download: 0

신용카드 부정사용 방지 시스템위한 sample size 결정

Title: 신용카드 부정사용 방지 시스템위한 sample size 결정

Authors: 이희진

Issue Date: 2005

Department/Major: 대학원 통계학과

Publisher: 이화여자대학교 대학원

Degree: Master

Abstract: 본 연구는 신용카드 거래 승인시스템에 결합하여 사용할 수 있는 부정 색출모형을 제시하기 위해 보다 효율적으로 sample size을 결정하기 위함이다. 우리가 흔히 제 3의 통화라고 불릴 만큼 보편화된 신용카드라는 것은 이를 제시함으로써 반복하여 신용카드 가맹점에서 물품의 구입 또는 용역의 제공을 받을 수 있는 증표로서 신용카드업자가 발행한 것으로 현금사회의 불편에 따른 해결과 전자상거래의 활성화를 통한 간편한 결제수단이라 하겠다. 이렇게 신용카드 산업은 편리함을 통해 고수익으로 발전할 수 있는 산업으로 우리나라에서도 1980년대부터 발전하기 시작하여 지금까지 급속도로 발전하고 있다. 그러나 신용위험 관리가 소홀해 질 경우 부실이 늘어나 손실이 커지므로 카드사마다 신용카드 부정사용방지를 위해 노력하고 있다. 따라서 본 논문은 부정 사용유형들을 통해 각각의 유형들을 학습하여 신용카드 거래들 중 부정 위험이 높은 거래를 식별하게 하는 신용카드 부정사용 방지 시스템 구축을 위함이다. 신용카드의 데이터의 경우 정상건(Non-Fraud)과 사고건(Fraud)의 상대적인 비율차이가 크기 때문에 무작위(random)로 sample을 만들면 카드 부정 사용을 발견하기가 어렵다. 그러므로 카드 부정 사용 방지 시스템의 모델을 구축하기 위해 sample을 어떻게 정하는 것이 모델 성능과 효율에 큰 영향을 미치므로 sample size을 결정하는 방법이 중요하다. 이에 본 논문에서는 모델의 효율성을 높이기 위한 sample size을 결정하기위해 방안을 제시하고자 한다. Sample을 나누어 부정 색출 하는데에 있어 효율적인 판단을 위해 데이터마이닝 기법의 의사결정트리 중 C4.5 알고리즘과 CART 알고리즘을 적용하여 비교분석하였다. 서론에서는 신용카드에 대한 일반적인 설명을 한 뒤, 방법론에서 의사결정트리에 해당하는 C4.5 알고리즘과 CART 알고리즘의 설명을 하였다. 그 다음으로 본 논문의 연구목적인 실증적인 분석에 관한 방법과 그에 따른 성능 결과를 논하겠다. 실증적인 방법에 있어서 사고건의 비율에 따른 sample size를 나누는 방법과 weight에 따른 sample size를 나누는 방법을 나누어 분석하였다. 사고건의 비율에 따른 sample를 비교하는 경우 개별 모델에 따라 각각 C4.5 알고리즘과 CART 알고리즘을 비교분석하였고, weight에 따른 sample의 경우 차이가 미세하여 각각의 경우에 따라 10개씩의 개별모델을 앙상블(Ensemble)을 하여 연구하여 이에 대한 결과를 통해 방안을 제시 하고자 한다.;This dissertation is concerned about deciding efficient sample size to seek out fraud detection models which can be used with the current transaction approval system. A credit card, generally known as the third currency, is like a voucher issued by credit card companies which is used to buy goods or services. This is an effective means for payments since it settles inconvenience of having to carry cash all the time as well as activating E-transaction. The credit card industry has been rapidly developing into high profitable industry because of its convenience from early 1980s in Korea. From then on, every credit card company has been trying to reduce the number of fraud to prevent a loss due to neglected credit risk management. This study, hence, aims at developing fraud detection models which distinguish transactions with higher risks among others by studying individual types and comparing them with previous fraud examples. Since there is a big gap between the number of non-fraud and fraud for credit card data, it is difficult to detect credit card frauds with randomly chosen samples. Thus it is important to find the right method in deciding sample size as it has a big impact on efficiency and capability in building fraud detection models This study aims at developing a method for choosing a sample size to improve efficiency of a model. For efficient judgment, two deciding tree analyses for data mining, C4.5 algorithm and CART algorithm were applied in searching out frauds. First, we explain general aspects on credit cards and then go on to C4.5 algorithm and CART algorithm before taking up the main subject of sample size analysis. Then we discuss a method of fact based analysis, the object of the thesis, as well as its efficiency. There are two methods based on sample size selecting method, which are "by the ratio of frauds" and "by weight" respectively. In the first analysis by the ratio of frauds, samples were compared with C4.5 algorithm and CART algorithm in each case. In the second analysis called the Ensemble method 10 samples were combined and analyzed because of their minor differences. Synthetically, this thesis is going to develop an efficient sample size which improves the fraud detection models.