Group-1_BAN100 - Logistics Regression

.pdf

School

Seneca College *

*We aren’t endorsed by this school

Course

100

Subject

Computer Science

Date

May 24, 2024

Type

pdf

Pages

Uploaded by MateSalamander4103

4/11/24, 9:00 PM Program Summary - Program 1 about:blank 1/7 Program Summary - Program 1 Execution Environment Author: u63732517 File: SAS Platform: Linux LIN X64 3.10.0-1062.12.1.el7.x86_64 SAS Host: ODAWS01-USW2-2.ODA.SAS.COM SAS Version: 9.04.01M7P08062020 SAS Locale: en_US Submission Time: 4/11/2024, 9:00:19 PM Browser Host: POOL-173-33-158-189.CPE.NET.CABLE.ROGERS.COM User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/123.0.0.0 Safari/537.36 Application Server: ODAMID00-USW2-2.ODA.SAS.COM Code: Program 1 libname DATALIB '/home/u63732517/BAN100 STAT/DATALIB/DATALIB/' ; /* Sam Oswald - 138928239 Gelasia Mendonca- 104624234 Abhishek Shah - 138939236 */ proc freq data = DATALIB.safety ; tables Region Size Type / nocum nopercent ; run ; /*2) What is the proportion of cars made in North America? The proportion of cars made in North America is approximately 63.54%. (divide the number of cars from North America by the total number of cars in the dataset.) 3) What is the proportion of cars of Size 2? The proportion of cars of Size 2 is approximately 30.21%. (by dividing the number of Size 2 cars by the total number of cars.) 4) What is the proportion of cars in the Sport/Utility segment? The proportion of cars in the Sport/Utility segment is approximately 16.67%. (by dividing the number of Sport/Utility cars by the total number of cars.) 5)For the variables Unsafe, Size, Region, and Type, are there any unusual data values that warrant further investigation? Based on the frequency tables provided, there are no unusual data values that warrant further investigation for the variables Unsafe, Size, Region, and Type. Each category appears logical and expected within the context of the dataset. 6)Please provide the syntax and output as well as interpretation of the output Based on the frequency data: - The dataset shows a diverse range of car sizes and types, suggesting it's well-suited for analyzing safety across different vehicle categories. - There's a higher representation of North American cars, indicating potential regional biases in the data. - The balance in car types and sizes allows for a comprehensive analysis of safety trends across various vehicle characteristics. */ /* B */ proc format ; value safefmt 0 = 'Average or Above' 1 = 'Below Average' ; run ; proc freq data = DATALIB.safety ; tables Region * Unsafe / chisq expected oddsratio ; format Unsafe safefmt. ; run ; * 1) For the cars made in Asia, what percentage had a below-average safety score?; * Answer: For the cars made in Asia, 42.86% had a below-average safety score.; * 2) For the cars with an average or above safety score, what percentage was made in North America?; * Answer: 69.70% of the cars with an average or above safety score were made in North America.;

4/11/24, 9:00 PM Program Summary - Program 1 about:blank 2/7 Log: Program 1 Notes (18) * 3) Do you see a statistically significant (at the 0.05 level) association between Region and Unsafe?; * Answer: No, there is not a statistically significant association between Region and Unsafe at the 0.05 level, as the Chi-Square test p-value is 0.0631.; * 4) What does the odds ratio compare and what does this one say about the difference in odds between Asian and North American cars?; * Answer: The odds ratio compares the odds of having a below-average safety score between two groups, in this case, Asian and North American cars. An odds ratio of 0.4348 indicates that the odds of Asian cars having a below-average safety score are lower compared to North American cars, but the 95% confidence interval (0.1790 to 1.0562) includes 1, suggesting this difference might not be statistically significant at the 0.05 level.; * 5) Interpretation of the PROC FREQ Output:; * The analysis of Region by Unsafe, including expected frequencies, chi-square test, and odds ratio, provides insights into the association between car manufacturing region and safety scores.; * The chi-square test resulted in a p-value of 0.0631. This suggests that at the 0.05 significance level, we do not have sufficient evidence to conclude a statistically significant association between the manufacturing region of the cars and their safety scores. The lack of statistical significance indicates that the observed differences in safety scores between regions could be due to chance.; * The odds ratio calculated was 0.4348 with a 95% confidence interval from 0.1790 to 1.0562. This odds ratio suggests that cars made in Asia are less likely to have below-average safety scores compared to cars made in North America, although the confidence interval includes 1, indicating this result is not statistically significant at the 0.05 level. The inclusion of 1 in the confidence interval means we cannot be confident about the direction of the association without it potentially being due to random variation.; * The analysis points to a nuanced understanding of regional differences in car safety scores, but without conclusive evidence of a statistically significant difference, suggesting that further investigation or additional data might be necessary to draw firm conclusions.; /* C */ libname DATALIB '/home/u63732517/BAN100 STAT/DATALIB/DATALIB/' ; title "c. Logistic Regression" ; proc logistic data = DATALIB.safety descending ; class Region ( ref = 'Asia' ) Size ( ref = '3' ) / param = ref ; model Unsafe ( event = '1' ) = Weight Region Size / clodds = pl ; run ; /* 1. Write the logistic regression equation. Answer: logit(P(Unsafe=1))=0.0500−0.6678×Weight−0.3775×Region N America + 2.6783×Size 1 + 0.6582 × Size 2 2.Do you reject or fail to reject the null hypothesis that all regression coefficients of the model are 0? (1 point) Answer: The global null hypothesis tests whether all regression coefficients are 0. Given the p-values from the likelihood ratio, score, and Wald tests are all less than .0001, we reject the null hypothesis, indicating that at least one of the coefficients is significantly different from zero. 3.If you do reject the global null hypothesis, then which predictors significantly predict safety outcome? (2 points) Answer: The variable Size is significant (p = 0.0005 for the Type 3 Analysis of Effects). Specifically, Size 1 is significant with a p-value of 0.0024. 4.Interpret the odds ratio for significant predictors. (2 points) Answer: For Size 1, the odds ratio is 14.560 with a 95% confidence interval of 3.018 to 110.732. This suggests that cars of Size 1 are much more likely (about 14.56 times) to have below-average safety scores compared to cars of Size 3, holding other variables constant. The wide confidence interval indicates some uncertainty in this estimate but confirms that Size 1 is a strong predictor. 5.If you only include significant predictors, what would be logistic regression equation? (1 point) Answer: If only including significant predictors, the logistic regression equation is simplified to: logit(P(Unsafe=1))=0.0500+2.6783×Size 1 Since Size 1 is the only significant predictor in this model. 6.Interpretation Answer: *The model fit statistics indicate that the model with predictors fits significantly better than an intercept-only model. *The weight of the car does not seem to be a significant predictor of safety score (p=0.1456). *Region also does not appear to be a significant predictor (p=0.5020 for North America vs. Asia). *Size is a significant predictor of safety, particularly Size 1 (p=0.0024), indicating that small or sports cars have higher odds of being rated below average for safety compared to large or sport/utility vehicles. *The percent concordant of 81.9% and the c statistic of 0.848 indicate a good predictive ability of the model.*/

4/11/24, 9:00 PM Program Summary - Program 1 about:blank 3/7 1 OPTIONS NONOTES NOSTIMER NOSOURCE NOSYNTAXCHECK; 68 69 libname DATALIB '/home/u63732517/BAN100 STAT/DATALIB/DATALIB/'; NOTE: Libref DATALIB refers to the same physical library as _TEMP0. NOTE: Libref DATALIB was successfully assigned as follows: Engine: V9 Physical Name: /home/u63732517/BAN100 STAT/DATALIB/DATALIB 70 /* 71 Sam Oswald - 138928239 72 Gelasia Mendonca- 104624234 73 Abhishek Shah - 138939236 74 */ 75 proc freq data=DATALIB.safety; NOTE: Data file DATALIB.SAFETY.DATA is in a format that is native to another host, or the file encoding does not match the session encoding. Cross Environment Data Access will be used, which might require additional CPU resources and might reduce performance. 76 tables Region Size Type / nocum nopercent; 77 run; NOTE: There were 96 observations read from the data set DATALIB.SAFETY. NOTE: PROCEDURE FREQ used (Total process time): real time 0.01 seconds user cpu time 0.02 seconds system cpu time 0.01 seconds memory 1688.21k OS Memory 24228.00k Timestamp 04/12/2024 01:00:18 AM Step Count 75 Switch Count 2 Page Faults 0 Page Reclaims 135 Page Swaps 0 Voluntary Context Switches 18 Involuntary Context Switches 0 Block Input Operations 0 Block Output Operations 272 78 79 /*2) What is the proportion of cars made in North America? 80 The proportion of cars made in North America is approximately 63.54%. 81 (divide the number of cars from North America by the total number of cars in the dataset.) 82 83 84 3) What is the proportion of cars of Size 2? 85 The proportion of cars of Size 2 is approximately 30.21%. 86 (by dividing the number of Size 2 cars by the total number of cars.) 87 88 4) What is the proportion of cars in the Sport/Utility segment? 89 The proportion of cars in the Sport/Utility segment is approximately 16.67%. 90 (by dividing the number of Sport/Utility cars by the total number of cars.) 91 92 5)For the variables Unsafe, Size, Region, and Type, are there any unusual 93 data values that warrant further investigation? 94 Based on the frequency tables provided, there are no unusual data values 95 that warrant further investigation for the variables Unsafe, Size, Region, 96 and Type. Each category appears logical and expected within the context of the dataset. 97 98 6)Please provide the syntax and output as well as interpretation of the output 99 Based on the frequency data: 100 101 - The dataset shows a diverse range of car sizes and types, suggesting 102 it's well-suited for analyzing safety across different vehicle categories. 103 - There's a higher representation of North American cars, 104 indicating potential regional biases in the data. 105 - The balance in car types and sizes allows for a comprehensive analysis 106 of safety trends across various vehicle characteristics. 107 108 */ 109 110 111 /* B */ 112 proc format; 113 value safefmt 114 0 = 'Average or Above' 115 1 = 'Below Average'; NOTE: Format SAFEFMT is already on the library WORK.FORMATS. NOTE: Format SAFEFMT has been output. 116 run; NOTE: PROCEDURE FORMAT used (Total process time): real time 0.00 seconds user cpu time 0.00 seconds system cpu time 0.00 seconds memory 246.96k OS Memory 23968.00k Timestamp 04/12/2024 01:00:18 AM Step Count 76 Switch Count 0 Page Faults 0 Page Reclaims 14 Page Swaps 0 Voluntary Context Switches 0

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version