HWCase2

.docx

School

North Carolina State University *

*We aren’t endorsed by this school

Course

551

Subject

Information Systems

Date

Feb 20, 2024

Type

docx

Pages

6

Uploaded by CaptainDiscovery13338

Report
CASE 2 Instructions: Please use the Stolenrecords and crimebystate datasets to complete the following analysis. Please answer each questions fully and supply any supporting analysis or results (e.g. screenshots). Each question is worth 10 points unless otherwise noted. Background: Your company recently had a security breach in which millions of customers private information was stolen from your company. Your company’s reputation is at risk, so you are interested in providing assistance and guidance to these customers about protecting themselves from identity theft (thieves using the information to open other accounts or commit other illegal acts). You would like to identify which customers are more likely to be a victim. You have a file from a previous breach that has information on customers and which of the customers became a victim of identity theft (Stolenrecords). You also have a file of crime statistics by state (crimebystate). Use these two files to answer the following questions: 1. Build a classification tree for Identity theft by determining which variables to include as predictors (fit what you think is the best model). a. Which variables, if any, did you choose not to include in the model? Why? b. How many splits are in your final tree? (5 points) The final tree consists of 379 Splits. c. What is the misclassification rate for this model? Is the model better at predicting victims or non-victims? Explain.
Looking at sensitivity and specificity, we can see that the model is much better at predicting non-victims (95.37%) than victims (32.50%). This means that the model is more likely to correctly identify someone who will not be a victim of identity theft than someone who will be. Overall, the model seems to be more reliable for identifying people who are not at risk of identity theft. However, it is important to remember that even with a high specificity, there is still a chance that the model will incorrectly identify someone as being at low risk when they are actually at high risk. Therefore, it is important to use this model in conjunction with other risk assessment methods. Actual Predicted No 0 230407 1 38796
d. What is the area under the ROC curve for Victims? Interpret this value. Does the model do a better job of classifying victims than a random model? The area under the ROC curve (AUC) for victims is 0.8124, signifying the model's effectiveness in distinguishing victims from non-victims. An AUC above 0.5 indicates better-than-random classification, with values closer to 1 indicating superior differentiation between classes. e. What is the lift for the model at portion = 0.1 and at portion = 0.20? Interpret these values. The lift for the model at portion = 0.1 means that at 10% of the population, the model is 3.5 times more effective at identifying victims compared to random chance. Similarly, at portion = 0.20, the model is 2.75 times more effective. This indicates the model's utility in identifying a larger proportion of victims compared to random guessing.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help