The Regression Model Of The United States

Better Essays
First of all, I would like to mention that it is more reasonable to compare the models that are based on the same data, so I tried to use the same variables and the same missing value treatment approach (excluding decision tree) to all of the models.
All the 3 models showed a performance of nearly the same quality, according to the various lift charts produced and presented in the further parts of the report.

However, the difference becomes more evident on the % captured response and the most efficient and useful model turns out to be the logistic regression model.
It is described in a greater detail in part 4 of this report.

This ROC plot indicates that the logistic regression is also efficient in terms of trade-off between
…show more content…
2. Recommended Model - Decision Tree

The recommended decision tree model includes 2 variables : annual income and loans, both of them are interval variables and represent the original observations. They were chosen for the final model, because after several trials, they proved to be the key ones in determining the rules within decision trees.
In terms of missing values, nothing particular had to be done, because decision trees conveniently handle missing values by default.
As for the splitting criterion, after getting more knowledge about each of the criteria and performing numerous trials , Gini was chosen, due to its ability to measure the differences between the values of a frequency distribution.
Presented below is the model assessment graph that represents the misclassification rates at each number of leaves.

As can be seen from the graph, the model enables to reduce the difference between the training and actual sets compared to other situations when different settings were used and different variables included.

Another indicator of this model’s usefulness is the lift value graph. The base line represents the nonexistence of our prediction model, while the intercept of the red line states that with this decision tree we can identify 3,7% more bad customers than we would have done without it.

The %
Get Access