Credit Evaluation Model for Banks Using Data Mining Techniques
By Sharan Brahmanapally, Bachelor of Technology
A Project submitted in Partial
Fulfillment of the Requirements
For the Degree of
Master of Science
In the field of Industrial Engineering
Advisory Committee:
Dr. Hoo Sang Ko
Graduate School
Southern Illinois University Edwardsville
August, 2015
TABLE OF CONTENTS
TABLE OF CONTENTS ii
LIST OF FIGURES iii
LIST OF TABLES iii
ABSTRACT iv
CHAPTER 1 1
INTRODUCTION 1
1.1 Introduction 1
1.1.1 Decision Process for Credit Evaluation 3
1.2 Problem Statement 4
1.3 Aim of the Project 4
1.4 Objectives of the Project 5
CHAPTER 2 6
LITERATURE REVIEW 6
2.1 Introduction 6
2.2 Theoretical Background 6
2.2.1 Decision Trees 6
2.2.2
…show more content…
It is a process of analyzing the relationship among the data from various perspectives and summarizing it into valuable information. It also assists the banks to look for hidden patterns in a group and discovers unknown relationships in the data. These data mining techniques facilitate useful data interpretations for the banking sector to avoid customer attrition. An accurate prediction on the credit approval is important to prospective homeowners, developers, investors, appraisers, tax assessors and other real estate market participants without fraudulence. People who are looking to buy a new place or thing, tend to be more conservative with their budget and acquiring loans from financial institutions. The credit functionality is prime for any banking system over the tentative market conditions. The lack of general credit review system & precise methods in banks are the important reasons, why an expert support system is necessary.
This project aims to evaluate the performance and accuracy of classification models for credit evaluation. The classification models are developed based on decision trees (J48 & CART), Support Vector Machine (SVM) and Logistic Regression along with Ensemble Methods. We used a credit approval dataset from UCI repository to compare the accuracies of the various data mining techniques. All the developed models achieved more than 85% accuracy, and
Undoubtedly, the overall prediction accuracy is very important for the banks when making predictions. However, we might consider which type of mistake we would like to avoid more. For the banks, not issuing loans to people who are actually going to repay would only cause a small loss on interests. However, issuing loans to people who are going to default would cause a big loss on the unpaid loans. Thus, it is reasonable to believe that accurate selection of people who are going to default is of higher priority to the banks. In light of this belief, QDA is also worth considering to the banks. A byproduct during the classification is the importance table produced by the random forest, which is shown
Many consumers fail to obtain their credit scores before applying for any type of financing. Credit scores are a big part of determining whether or not a consumer qualifies for financing. Credit scores also play a role in the interest rates consumers will pay as part of their financing. The best way to determine accurate credit scores is to order credit reports from one of the three major reporting agencies: TransUnion, Experian and Equifax. In short, the higher the credit score, the better chance of loan approval for a car purchase.
So it is also attractive for Jules Kroll to take this opportunity to enter the credit rating industry. In addition, the “issuer pays” model, which used by the big three rating agencies, lets companies shop around for the best ratings, putting pressure on the agencies to inflate their grades. As a result, it is very difficult to argue that they can adequately represent the users’ side. Therefore, Jules Kroll wants to use another model that can assign unbiased and reliable ratings.
During this case study, we can see that the current state the following: there are an increasing number of competitors in the credit industry. More and more companies enter this industry, making even more difficult for the bank to compete and gain more customers shares. So, we’ve perform a means-end analysis of the situation in order to get the maximum information for them improve their services and be more efficient. The goal of this means-end analysis was to find a way to differentiate from the others by any means. But the first step of it was to find out what’s important for the customers when we’re talking a
Lending evaluations by Santander are based on credit background of the person (or company) who wants borrow. Through the development of a credit-approval system, Santander Consumer Finance increased an understanding into these clients on an online database which allowed for the use of real time analysis in determining interest rates for the business interactions.
On the other hand, my interest in the incorporation of artificial intelligence and machine learning by financial institutions stems from my work as a Credit Risk Analyst for BECU. I recently joined the institution, and am working on a year-long project to integrate the Synergy Component System which will enable increased automation, improved attribution management, and reduced decision times for credit card and auto loans. Beyond testing plans and cases, I am most excited about designing decision matrices by analyzing member data such as credit quality, debt-to-income ratio, membership years, and trade lines to reach a credit decision. While I have had already experienced academic research, this will be my first taste of applied industry research.
How data mining can assist bankers in enhancing their businesses is illustrated in this example. Records include information such as age, sex, marital status, occupation, number of children, and etc. of the bank?s customers over the years are used in the mining process. First, an algorithm is used to identify characteristics that distinguish customers who took out a particular kind of loan from those who did not. Eventually, it develops ?rules? by which it can identify customers who are likely to be good candidates for such a loan. These rules are then used to identify such customers on the remainder of the database. Next, another algorithm is used to sort the database into cluster or groups of people with many similar attributes, with the hope that these might reveal interesting and unusual patterns. Finally, the patterns revealed by these clusters are then interpreted by the data miners, in collaboration with bank personnel.4
This project proposal will set out to influence Westpac’s decision in funding the most effective data-mining model. This model will help to differentiate the financial institutions products in the current saturated market. Westpac must appeal to all demographics and it is extremely important to develop products for the growing generations. The model proposed will analyse the institutions current historical data to shed light on the risk of lending a loan to millennials. By discussing past studies and research, the report will analyse how others have struggled to deal with this exact problem. The data-mining model will follow the Cross Industry Process for Data Mining (CRISP-DM) Methodology to present the proposal. The document also includes a plan and timetable to highlight how doable this task will be.
My long-term career goal is to found my own financial company that specializes in applying advanced analytics tools to solve complex financial problems. Having worked as a financial analytics analyst for more than 6 months at EnovaFinancial, a Chicago-based consumer online financing company, I reinforce my passion towards financial analytics. Ranging from basic data query and reporting to predictive modeling and optimization, data analytics has assumed an important role in today’s financial services industry. Mastering data analytics could help financial institutions acquire the relevant information in the shortest amount of time and make the informed decisions afterwards. Since personal loan industry is only the niche of financial services, I would like to broaden my horizon of the financial knowledge by gaining a deeper understanding of its principle and its application to meet my long-term goal. Entering a full-time master in finance program therefore becomes the best option for me.
The Comparisons of Classification Accuracy of Statistical Software Performance for Default of Credit Card Clients
As a result of this method, a Bank can identify at an early stage a credit risk
My job in the personal loan center was to assist in maintaining the database and generating evaluations. One project which I was involved in was using personal credit scoring model to make a risk forecast and detect bank fraud based on loan documentation in the database. During the project, I realized that rather than just about complicated methods, statistics was about simplifying and making sense. Sufficient data and in-depth analysis can not only help companies make valuable decisions, but also enable us to understand real life in a more direct and precise way. As the trend of information explosion is inevitable, I am sure that the power of data will become increasingly overwhelming, and the role of data analysis will be indispensable. Therefore, along with my goal to pursue graduate studies in statistics, my desire to pursue a career in the field of data analysis was
Credit analysis is the process of examining the creditability, credit history and financial situation of client or potential borrowers. The banks or moneylenders to ensure safety of their money use this process. At the same time, bond portfolio managers or investors to debt issuing companies on the basis of their credit rating and financial situations also define credit analysis as investigation process that performs. Credit rating is an important instrument that used by lenders when performing credit analysis of a bond issuing institution or individual (Johnson, 2010).
The recommended decision tree model includes 2 variables : annual income and loans, both of them are interval variables and represent the original observations. They were chosen for the final model, because after several trials, they proved to be the key ones in determining the rules within decision trees.
This paper analyzes data obtained over the history of Lending Club from 2007 to the most recent year's available data and applies statistical methods to this data as follows. First, it assesses factors that are most highly correlated with the success or failure of a personal loan approved by Lending Club (e.g.: debt to income