1762 Words8 Pages

First of all, I would like to mention that it is more reasonable to compare the models that are based on the same data, so I tried to use the same variables and the same missing value treatment approach (excluding decision tree) to all of the models.
All the 3 models showed a performance of nearly the same quality, according to the various lift charts produced and presented in the further parts of the report.
However, the difference becomes more evident on the % captured response and the most efficient and useful model turns out to be the logistic regression model.
It is described in a greater detail in part 4 of this report.
This ROC plot indicates that the logistic regression is also efficient in terms of trade-off between*…show more content…*

2. Recommended Model - Decision Tree The recommended decision tree model includes 2 variables : annual income and loans, both of them are interval variables and represent the original observations. They were chosen for the final model, because after several trials, they proved to be the key ones in determining the rules within decision trees. In terms of missing values, nothing particular had to be done, because decision trees conveniently handle missing values by default. As for the splitting criterion, after getting more knowledge about each of the criteria and performing numerous trials , Gini was chosen, due to its ability to measure the differences between the values of a frequency distribution. Presented below is the model assessment graph that represents the misclassification rates at each number of leaves. As can be seen from the graph, the model enables to reduce the difference between the training and actual sets compared to other situations when different settings were used and different variables included. Another indicator of this model’s usefulness is the lift value graph. The base line represents the nonexistence of our prediction model, while the intercept of the red line states that with this decision tree we can identify 3,7% more bad customers than we would have done without it. The %

2. Recommended Model - Decision Tree The recommended decision tree model includes 2 variables : annual income and loans, both of them are interval variables and represent the original observations. They were chosen for the final model, because after several trials, they proved to be the key ones in determining the rules within decision trees. In terms of missing values, nothing particular had to be done, because decision trees conveniently handle missing values by default. As for the splitting criterion, after getting more knowledge about each of the criteria and performing numerous trials , Gini was chosen, due to its ability to measure the differences between the values of a frequency distribution. Presented below is the model assessment graph that represents the misclassification rates at each number of leaves. As can be seen from the graph, the model enables to reduce the difference between the training and actual sets compared to other situations when different settings were used and different variables included. Another indicator of this model’s usefulness is the lift value graph. The base line represents the nonexistence of our prediction model, while the intercept of the red line states that with this decision tree we can identify 3,7% more bad customers than we would have done without it. The %

Related

## A Regression Analysis Of Consumer Related Data For A Specific Product

1276 Words | 6 Pagesmain objective of this paper is to carry out a regression analysis of consumer related data for a specific

## Statistical Analysis for Property Crimes

1120 Words | 5 PagesI need help to create a multiple regression anlysis for this problem. Please provide as much explanation

## Effects Of Obesity On The United States Essay

1136 Words | 5 PagesObesity rates in the United States have increased significantly over the past three decades, which have

## Money Demand Literature Review : Analysis Essay

1149 Words | 5 Pages(2015) states that regression analysis allows for the use of variables in mathematical models to determine

## Relationship Between Capita And College Graduation Rate

1551 Words | 7 Pagesgraduation rate by state in 2010. I intend to prove that average life expectancy by state, the dependent

## A Brief Note On Diabetes Prevalence Rate And Socioeconomic And Life Style Variables

962 Words | 4 PagesDiabetes is a major health problem in the United States. There is an increasing interest in the relationship

## Property Crime Report

1462 Words | 6 Pageshas a slight increase. Inferences about the Regression Coefficient • As stated above for the p-value

## Managerial Economics and Globalization Eco 550

1234 Words | 5 Pagesthe sample and predicted data. How well a regression model fits the data is shown through the coefficient

## The Effects Of Unemployment On Crime Rates

1724 Words | 7 Pagesstems from the fact that the cities in the United States with the highest crime rates all have a poverty

## Analysis Of The American Community Survey

1580 Words | 7 PagesAbstract This multiple regression project relies on secondary data collected by the American Community