INTRODUCTION
The dataset used for the project is the German credit dataset that consists of customers’ financial and credit information and the resulting classification of customers as “good” or “bad” credit risks. This is a well-known publicly available dataset containing observations on 20 variables of 1000 past applicants of which 700 are classified as “good” credit risk and 300 are classified as “bad” credit risk.
This report lists the detailed steps involved in developing a credit scoring model that can be used to determine if a new applicant is a good credit risk or a bad one, based on their predictor variables.
Tools Used:
SAS Enterprise Miner 4.3
IBM SPSS Statistics 22
Modeling Techniques Used:
Decision Tree
DATA PREPARATION AND EXPLORATION
The modeling process incorporated in this project is based on the Enterprise Miner SEMMA methodology which stands for Sampling, Exploring, Modifying, Modeling, and Assessing data. The goal of this project is to develop a credit score model that can be used as a prediction model for any prospective customers. Hence, the next step was to prepare the collected data.
The German credit score dataset was provided in a comma separated values (.csv) format. When the dataset was opened through MS Excel, the values of the variables were displayed as numbers without any logical understanding of what they meant. A screen shot of the data viewed through Excel is provided in Figure 1.
The description of the data was provided separately (See
Capital One uses IT through its information-based strategy (IBS) to “record, organize, and analyze data on the characteristics and behaviors of their customers,” as stated by CEO Richard Fairbank. Their philosophy was to exploit information by constructing scientific models that could be used to both assess the creditworthiness of potential cardholders through FICO scoring, and to customize product offerings for existing ones. This was done through data mining, sorting, customizing offers and marketing campaigns, and then analyzing this data to see what campaigns worked – for what reason and what
One may decide to pay cash for everything but, there are reasons to focus on obtaining and keeping a good credit score. The first step toward understanding how credit affects ones’ life is to check the credit standing. One can get two of their credit scores for free on Credit.com. This completely free tool will break down the credit score into sections and give a grade for each. For example, how is the payment history, debt and other factors affecting your score, and get recommendations for steps that can be taken to improve ones’ credit. It is possible to get a free annual credit report from each of the major credit reporting agencies Equifax, Experian and TransUnion once every 12 months. This does not give the credit scores but, it does
Problem 2.6: In fitting a model to classify prospects as purchasers or non-purchasers, a certain company drew the training data from internal data that include demographic and purchase information. Future data to be classified will be lists purchased from other sources, with demographic (but not purchase) data included. It was found that “refund issued” was a useful
Credit scores are numbers resulted from a statistical analysis of a person 's credit history. They represent the creditworthiness of that person. Credit scores are primarily based on credit report sourced from credit bureaus. Lenders use credit scores to a
Undoubtedly, the overall prediction accuracy is very important for the banks when making predictions. However, we might consider which type of mistake we would like to avoid more. For the banks, not issuing loans to people who are actually going to repay would only cause a small loss on interests. However, issuing loans to people who are going to default would cause a big loss on the unpaid loans. Thus, it is reasonable to believe that accurate selection of people who are going to default is of higher priority to the banks. In light of this belief, QDA is also worth considering to the banks. A byproduct during the classification is the importance table produced by the random forest, which is shown
In an attempt to improve the model, we attempt to do a multiple regression model predicting income based on credit balance, years, and size.
Capital One use information system to assess the credit risks of customers while other companies rely on personal experience and judgments
So it is also attractive for Jules Kroll to take this opportunity to enter the credit rating industry. In addition, the “issuer pays” model, which used by the big three rating agencies, lets companies shop around for the best ratings, putting pressure on the agencies to inflate their grades. As a result, it is very difficult to argue that they can adequately represent the users’ side. Therefore, Jules Kroll wants to use another model that can assign unbiased and reliable ratings.
Lending evaluations by Santander are based on credit background of the person (or company) who wants borrow. Through the development of a credit-approval system, Santander Consumer Finance increased an understanding into these clients on an online database which allowed for the use of real time analysis in determining interest rates for the business interactions.
On the other hand, my interest in the incorporation of artificial intelligence and machine learning by financial institutions stems from my work as a Credit Risk Analyst for BECU. I recently joined the institution, and am working on a year-long project to integrate the Synergy Component System which will enable increased automation, improved attribution management, and reduced decision times for credit card and auto loans. Beyond testing plans and cases, I am most excited about designing decision matrices by analyzing member data such as credit quality, debt-to-income ratio, membership years, and trade lines to reach a credit decision. While I have had already experienced academic research, this will be my first taste of applied industry research.
How data mining can assist bankers in enhancing their businesses is illustrated in this example. Records include information such as age, sex, marital status, occupation, number of children, and etc. of the bank?s customers over the years are used in the mining process. First, an algorithm is used to identify characteristics that distinguish customers who took out a particular kind of loan from those who did not. Eventually, it develops ?rules? by which it can identify customers who are likely to be good candidates for such a loan. These rules are then used to identify such customers on the remainder of the database. Next, another algorithm is used to sort the database into cluster or groups of people with many similar attributes, with the hope that these might reveal interesting and unusual patterns. Finally, the patterns revealed by these clusters are then interpreted by the data miners, in collaboration with bank personnel.4
Knowing what other outlying debts customers have could be helpful in determining high-risk customers. Along with past credit history this could be helpful in determining customers to reject.
As technology improves, the wide use of “hard information”, such as the borrower’s credit history, reduces informational asymmetries. Therefore, long-distance small business lending is easier (Frame, Srinivasan, \& Woosley, 2001; Petersen \& Rajan, 2002). However, even with the use of credit score data, collecting ``soft information" still helps local lenders control risks to avoid delinquency (DeYoung, Glennon, \& Nigro, 2008) and provides informational advances in offering more favorable rates (Agarwal \& Hauswald, 2010).
Recently, using an “insurance score” as tool to determine risk is popular. The insurance score is used to foresee the likelihood of filing a claim or the level of risk. It is based on some of the information in ones credit report.