Assignment6
.pdf
keyboard_arrow_up
School
DePaul University *
*We aren’t endorsed by this school
Course
323
Subject
Industrial Engineering
Date
May 23, 2024
Type
Pages
2
Uploaded by SargentDiscoveryWolf39
D
ATA A
NALYSIS A
ND R
EGRESSION Assignment-6
| Total Points: 20 pts for DSC 323; 26 pts for DSC 423 Note: •
All assignments should be submitted in a single MS WORD format
, no PDFs or any other file types will be accepted. If you submit any other file type, it will not be graded. •
No extensions will be given unless for a documented reason specified in the syllabus, no late assignments past the due date even a couple of minutes late will be accepted as you have an extra day (7-days) to submit your assignments. •
Submitting work that is not yours is grounds for an automatic ‘F’ for the entire course – this includes taking content and ideas from others or consulting others to complete your deliverables other than your instructor. •
SAS software and virtual server stalls, gets slow and crashes; so start early and keep multiple backups in multiple places/mediums. Late submission or inability to do the assignment due to server and/or software issues will not be accepted. Any issues relating with SAS, contact IS using the phone number provided in the syllabus, I won’t be able to help you with DePaul software related issues. •
Make sure to double check your submissions. After you submit the assignment, log out of D2L, log back in, and click on your submission to see if you submitted the right file(s) and it is the correct version. Wrong submissions will not be graded.
Note: For all questions, immaterial if whether the relevant output is asked to be attached or not, make sure to include it. Also, it is important to include the sign (negative/positive or increase/decrease, and units of measurements e.g. $ or $ 99 million,%, etc.) otherwise points will be deducted. Problem 1 [10 pts] – to be answered by everyone Given the large number of competitors, cell phone carriers are very interested in analyzing and predicting customer retention and churn. The primary goal of churn analysis is to identify those customers that are most likely to discontinue using your service or product. The dataset churn.csv contains information about a random sample of customers of a cell phone company. For each customer, company recorded the following variables: 1.
CHURN: 1 if customer switched provider, 0 if customer did not switch 2.
GENDER: M, F 3.
EDUCATION (categorical): code 1 to 6 depending on education levels 4.
PRICE_PLAN_CHNG: 1 if price plan was changed, 0 otherwise 5.
TOT_ACTV_SRV_CNT: Total no. of active services 6.
AGE: customer age 7.
PCT_CHNG_IB_SMS_CNT: Percent change of latest 2 months incoming SMS wrt previous 4 months incoming SMS 8.
PCT_CHNG_BILL_AMT: Percent change of latest 2 months bill amount wrt previous 4 months bill amount 9.
COMPLAINT: 1 if there was at least a customer’s complaint in the two months, 0 no complaints The company is interested in a churn predictive model that identifies the most important predictors affecting probability of switching to a different mobile phone company (churn = 1). Answer the following questions: a)
Create two boxplots to analyze the observed values of age and PCT_CHNG_BILL_AMT by churn value. Analyze the boxplots and discuss how customer age and changes in bill amount affect churn probabilities. Include the boxplots.
b)
Using a selection method, fit the final logistic regression model to predict the churn probability using the data in the dataset (Churn is the response variable and the remaining variables are the independent x-variables). Include the SAS output. Write down the expression of the fitted model. c)
Analyze the final logistic regression model and discuss the effect of each variable on the churn probability. d)
Using SAS, compute the predicted churn probability and the confidence interval for a male customer who is 43 years old, and has the following information PRICE_PLAN_CHNG=0, TOT_ACTV_SRV_CNT=4, PCT_CHNG_IB_SMS_CNT= 1.04, PCT_CHNG_BILL_AMT= 1.19, and COMPLAINT =1. Include the output, interpret and explain the 3 values you obtained. e)
Copy and paste your FULL SAS code into the word document along with your answers. Problem 2 [10 pts] – to be answered by everyone Answer the following strictly based on the week-7 discussions. 1.
[4 pts] Was the disease dependent variable have balanced or imbalanced data? Explain your answer based on the data we used. 2.
[3 pts] I can use the ‘Year’ variable as is for regression model. Explain why/why not using an example discussed in class 3.
[3 pts] How can I use the dependent variable ‘building status’ which has 3 values for logistic regression. 1= Good Condition, 2 = Dilapidated, 3 = Poor Condition. Explain your answer Problem 3 [6 pts] – For Graduate Students ONLY 1.
Can we transform predictors for logistic regression? Explain why? 2.
Why can’t we use more than 2 values for the Y-variable in logistic regression? 3.
Why shouldn’t we check for linearity when using logistic regression?
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help