Module 3_Part2

.docx

School

Georgia Institute Of Technology *

*We aren’t endorsed by this school

Course

6414

Subject

Statistics

Date

Apr 3, 2024

Type

docx

Pages

3

Uploaded by Mattyboo

Report
It is important to remember that in this model, we do not have an error term! Slide 9: What are the model assumptions? A first assumption is the linearity of the link function of the probability of a success in the predicted variables, that is we write the g function of the probability of a success as a linear combination of the predicting variables. Although I'm going to refer to this assumption still as a linearity assumption, it is a different assumption than the linearity assumption in the regression model we have learned in the previous modules since the g link function is a non-linear transformation of the probability of the success or of the expectation of the response variable. Similar to the standard regression model, we also assume independence in the response data. The third assumption is specific to the logistic regression model. The logistic regression model assumes that the link function is the so-called logit function, provided here on the slide. The link function g is the log of the ratio of p over one minus p, where p again is the probability of success. This is an assumption since the logit function is not the only function that yields s-shaped curves. There are other s-shaped functions that are used in modeling binary responses, under a more general model framework called binomial model. We'll learn about other shape functions in a different lesson. Slide 10: I will continue with illustrating logistic regression with a data example I will be using throughout the lessons introducing the Basic Concepts of Logistic Regression. In 1972- 1974 a survey was taken in Whickham, a mixed urban and rural district near Newcastle, United Kingdom. Twenty years later a follow-up study was conducted. Among the information obtained originally was whether a person was a smoker or not. It was found that twenty years later, 76.12% of the 582 smokers were still alive with only 68.58% of 732 nonsmokers were still alive. That is, smokers had a higher survival rate than non- smokers. That will make the story for Philip Morris. Smoking leads to a longer life span. This example was provided by Dr. Jeffrey Simonoff from New York University. Slide 11:
This slide includes the R code to get you started with reading the data. Here is also the code for plotting the age versus the proportion of those that survived. We want to compare the relationship between age and the proportion of survival by smokers and nonsmokers separately. The plot shows a non-linear relationship between age and survival proportion. In fact, this looks more like an S shape , as I motivated in the previous lesson where I introduced the logistic regression model. Slide 13: Next, I transformed the survival proportion using the logit function, which is the log of the ratio between the proportion of survival divided by 1 minus the proportion of survival, which is called logit transformation or link funciton. Here I'm plotting the age versus the logit of the proportion of survival. I'm contrasting the plot that you saw in a previous slide on the left to the plot of the age versus logit of the survival rate. The relationship between age and the transformed survival rate improved compared to the un-transformed survival proportion. We still see a slight curvature. I will expand more on this when we're going to perform the logistic regression analysis on this example.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help