An Example About Prostate Cancer

1566 Words Jan 4th, 2016 7 Pages
If we take a look at an example about prostate cancer, with the data collected by Hastie, Tibshirani, Friedman in The Elements of Statistical Learning [2] and view the scatterplot in figure 1.1, we can see that the dependent variable, the log of the prostate specific antigen (lpsa) has a strong positive correlation particularly with lcavol (the log cancer volume) and lcp (the log of capsular penetration) with weaker but still strong correlations with the other dependent variables, log prostate weight (lweight), age, log of the amount of benign prostatic hyperplasia (lbph), and percent of Gleason scores 4 or 5 (pgg45), but not the svi (seminal vesicle invasion) and gleason (gleason score) as these are categorical variables [2]. Below figures 1.2 and 1.3 were fit with all variables and figures 1.4 and 1.5 were simplified by removing variables that had high p values until I felt that the model was better improved and they were fit thereafter. When we plot the fitted values against the residuals, if there is linearity, we should get an even spread around the line at 0. If we look at figures 1.2 and 1.4, for which the R coding can be found in the appendix below (section 7), we can see that they both seem to have linearity with figure 1.2 having possible outliers further away from the line and figure 1.4 having a more even spread. Taking a look at figure 1.3 we can see that there is a particularly good fit along the middle but the tails have fairly large variation, suggesting a…

More about An Example About Prostate Cancer

Open Document