1762 Words8 Pages

First of all, I would like to mention that it is more reasonable to compare the models that are based on the same data, so I tried to use the same variables and the same missing value treatment approach (excluding decision tree) to all of the models.

All the 3 models showed a performance of nearly the same quality, according to the various lift charts produced and presented in the further parts of the report.

However, the difference becomes more evident on the % captured response and the most efficient and useful model turns out to be the logistic regression model.

It is described in a greater detail in part 4 of this report.

This ROC plot indicates that the logistic regression is also efficient in terms of trade-off between*…show more content…*

2. Recommended Model - Decision Tree

The recommended decision tree model includes 2 variables : annual income and loans, both of them are interval variables and represent the original observations. They were chosen for the final model, because after several trials, they proved to be the key ones in determining the rules within decision trees.

In terms of missing values, nothing particular had to be done, because decision trees conveniently handle missing values by default.

As for the splitting criterion, after getting more knowledge about each of the criteria and performing numerous trials , Gini was chosen, due to its ability to measure the differences between the values of a frequency distribution.

Presented below is the model assessment graph that represents the misclassification rates at each number of leaves.

As can be seen from the graph, the model enables to reduce the difference between the training and actual sets compared to other situations when different settings were used and different variables included.

Another indicator of this model’s usefulness is the lift value graph. The base line represents the nonexistence of our prediction model, while the intercept of the red line states that with this decision tree we can identify 3,7% more bad customers than we would have done without it.

The %

All the 3 models showed a performance of nearly the same quality, according to the various lift charts produced and presented in the further parts of the report.

However, the difference becomes more evident on the % captured response and the most efficient and useful model turns out to be the logistic regression model.

It is described in a greater detail in part 4 of this report.

This ROC plot indicates that the logistic regression is also efficient in terms of trade-off between

2. Recommended Model - Decision Tree

The recommended decision tree model includes 2 variables : annual income and loans, both of them are interval variables and represent the original observations. They were chosen for the final model, because after several trials, they proved to be the key ones in determining the rules within decision trees.

In terms of missing values, nothing particular had to be done, because decision trees conveniently handle missing values by default.

As for the splitting criterion, after getting more knowledge about each of the criteria and performing numerous trials , Gini was chosen, due to its ability to measure the differences between the values of a frequency distribution.

Presented below is the model assessment graph that represents the misclassification rates at each number of leaves.

As can be seen from the graph, the model enables to reduce the difference between the training and actual sets compared to other situations when different settings were used and different variables included.

Another indicator of this model’s usefulness is the lift value graph. The base line represents the nonexistence of our prediction model, while the intercept of the red line states that with this decision tree we can identify 3,7% more bad customers than we would have done without it.

The %

Related

## A Regression Analysis Of Consumer Related Data For A Specific Product

1276 Words | 6 Pagesmain objective of this paper is to carry out a regression analysis of consumer related data for a specific

## Statistical Analysis for Property Crimes

1120 Words | 5 PagesI need help to create a multiple regression anlysis for this problem. Please provide as much explanation

## Effects Of Obesity On The United States Essay

1136 Words | 5 PagesObesity rates in the United States have increased significantly over the past three decades, which have

## Money Demand Literature Review : Analysis Essay

1149 Words | 5 Pages(2015) states that regression analysis allows for the use of variables in mathematical models to determine

## A Brief Note On Diabetes Prevalence Rate And Socioeconomic And Life Style Variables

962 Words | 4 PagesDiabetes is a major health problem in the United States. There is an increasing interest in the relationship

## Causal Relationship Between Gambling And Income

2875 Words | 12 Pageslevel and overall gambling revenue, in several states will be analyzed by measuring both real and nominal

## Brs Mdm3 Tif Ch11

3143 Words | 13 Pages(Balakrishnan/Render/Stair) Chapter 11 Forecasting Models 11.1 Chapter Questions 1) Consider the following

## Relationship Between Capita And College Graduation Rate

1551 Words | 7 Pagesgraduation rate by state in 2010. I intend to prove that average life expectancy by state, the dependent

## Invasive Plants Essay

687 Words | 3 PagesDr. John visited Norfolk State University on behalf of the Department of Ecology at the University of

## Property Crime Report

1462 Words | 6 Pageshas a slight increase. Inferences about the Regression Coefficient • As stated above for the p-value