Question 3 and 4

docx

School

Western Sydney University *

*We aren’t endorsed by this school

Course

200360

Subject

Statistics

Date

Apr 3, 2024

Type

docx

Pages

11

Uploaded by 19702001CD

Report
APPLIED PROJECT PART B – TERM 3 2021 STUDENT NAME STUDENT ID SIGNATURE Kristina Ubiparipovic 19828972 Chris Doughaim 20132864 Djurdja Saric 20219534 Description of the Data files
With your group, pick one dataset from Project Part A that you will use to answer Part B. If you are not sure of this talk to your teacher. : Provide a reason for choosing this dataset over the other datasets from Project Part A in your group The reason why we have chosen this dataset, is because we decided as a group which project would suit these questions and give us accurate results. As this dataset shows the right amount of information that gives us an understanding on how to answer each question correctly. The other datasets we could have chosen didn't have the right amount of information and we believed as a group it wouldn't have given us enough information to answer the following questions. Question 1 (7 marks) Marks All the working in EXCEL for this question must be submitted with the corresponding datafile. a. Using Excel, obtain a Descriptive Statistics output for the numerical variable chosen in the dataset. (Mean, Mode, Range and Standard Deviation)*. Write one paragraph describing the data set using the information about the mean, mode, range, standard deviation. Reading score
The standard deviation is the measure of how the data is related to the mean. As this standard deviation means high indicates data is more spread out. The range is the spread of the data from the highest to lowest of the distribution; our range on 66 is above average as it means it could have high variability or a low distribution. The mode score is 65 which means it's an avenge score meaning that the higher the mean score the higher the expectation. b. For the variable selected in part a) (using Excel to construct a histogram 8 classes)*. Write one paragraph describing the data set using the histogram. Comment on the shape of the distribution of the data. [3] The shape of the histogram is a bell curve, which is depicting the normal distribution also has a shape of a bell. The top of the curve shows the mean, mode, and median of the data collected. Its standard deviation depicts the bell curve's relative width around the mean.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
c. Compare the median and mean in part a). Is there a link between your finding with your comments in Part b)? [1] Mean 66.109, Median 66 The distribution of the data is symmetric as the Mean, Median score is approximately equal to each other. d. Using an appropriate Excel output, construct a 90% confidence interval for the numerical variable chosen in part a). writing score reading score math score Mean 68.054 Mean 69.169 Mean 66.089 Standard Error 0.480529 Standard Error 0.461699 Standard Error 0.479499 Median 69 Median 70 Median 66 Mode 74 Mode 72 Mode 65 Standard Deviation 15.19566 Standard Deviation 14.60019 Standard Deviation 15.16308 Sample Variance 230.908 Sample Variance 213.1656 Sample Variance 229.919 Kurtosis -0.03336 Kurtosis -0.06827 Kurtosis 0.274964 Skewness -0.28944 Skewness -0.2591 Skewness -0.27894 Range 90 Range 83 Range 100 Minimum 10 Minimum 17 Minimum 0 Maximum 100 Maximum 100 Maximum 100
Sum 68054 Sum 69169 Sum 66089 Count 1000 Count 1000 Count 1000 Confidence Level(90.0%) 0.791133 Confidence Level(90.0%) 0.760132 Confidence Level(90.0%) 0.789437 upper 68.84513 upper 69.92913 2 upper 66.878437 lower 67.26287 lower 68.40886 8 lower 65.299563 Question 2 (6 marks) Marks All the working in EXCEL for this question must be submitted with the corresponding datafile. In this task you are to analyze your data and create a confidence interval to estimate the population difference in means between two groups.. Using the dataset selected by your group, what two groups are you estimating the population difference in means? Using Excel obtain summary statistics (mean, standard deviation, and sample size) for each group. Find the critical value from t-distribution using Excel (Use T.INV.2T function).
The following formula is used to construct the confidence interval. Using Excel, construct the confidence interval (choose your own confidence level). Assume that the population variances are unknown but are equal. Also assume that these variables come from two populations that are normally distributed.* reading score math score Mean 69.169 66.089 Variance 213.1656 229.919 Observations 1000 1000 Pooled Variance 221.5423 Hypothesized Mean Difference 0 df 1998 t Stat 4.627084 P(T<=t) one-tail 1.97E-06 t Critical one-tail 1.645617 P(T<=t) two-tail 3.95E-06 t Critical two-tail 1.961152
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Interpret confidence interval in the context of data. [1] We are 90% confident that the scores are between Reading scores math scores upper 69.929132 upper 66.878437 lower 68.408868 lower 65.299563 Question#3 Answer: We will compare the reading and writing score of the students. Part (a): Excel Output: t-Test: Two-Sample Assuming Equal Variances Reading Score Writing Score Mean 69.169 68.054 Variance 213.1656046 230.907992 Observations 1000 1000 Pooled Variance 222.0367983 Hypothesized Mean Difference 0 df 1998 t Stat 1.67319821 P(T<=t) one-tail 0.047222412 t Critical one-tail 1.64561663
P(T<=t) two-tail 0.094444825 t Critical two-tail 1.961152015 Part (b): Hypothesis test: Null Hypothesis: There is no difference in the mean score of reading and writing. Alternate Hypothesis: There is a difference in the mean score of reading and writing. OR H 0 : μ r μ w = 0 H a : μ r μ w 0 Level of Significance: α = 0.05 Test Statistics: t =( X ¿¿ r X w )− ( μ ¿¿ r μ w ) S p 1 n 1 + 1 n 2 ¿¿ With n 1 + n 2 2 degree of freedom t test Statistics = 1.6732 Decision Rule: t - Critical two tailed ¿ 1.9612 If | t statistic | ≥t Critical Than we reject our Null Hypothesis.
1.6732 1.9612 So we do not reject our Null Hypothesis. Conclusion: As the t statistic value does not fall in the critical region so we do not reject our Null hypothesis and conclude that there is no difference in the mean score of Student’s reading and writing. Question# 4 Answer: We will use Writing Score as an independent variable while Math Score as a dependent variable in our Regression Analysis Part (a): Excel Output Regression Statistics Multiple R 0.802642046 R Square 0.644234254 Adjusted R Square 0.643877775 Standard Error 9.048716212 Observations 1000 ANOVA df SS MS F Significance F Regression 1 147973.5724 147973.6 1807.217 3.376E-226 Residual 998 81715.50656 81.87927 Total 999 229689.079 Coefficie nts Standard Error t Stat P- valu e Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
Interc ept 11.58310 067 1.313691 2 8.817 217 5.14 E-18 9.005186 827 14.16101 451 9.00518 683 14.16101 451 writin g score 0.800921 317 0.018840 167 42.51 137 3.4E -226 0.763950 432 0.837892 203 0.76395 043 0.837892 203 Scatter Plot: 0 20 40 60 80 100 120 0 20 40 60 80 100 120 Math Score vs Writing Score Writing Score Math Score Part (b): Regression Equation: Math Score = 11.583 + 0.8009 ( WritingScore ) The above regression equation can be used to predict the Math score using the writing score of a student as Math score is dependent on Writing Score. Part (c): In the above regression equation the intercept value is equal to 11.583 while the slope of line is 0.8009 which means that for every 1 score increase in the Writing of a student than there will be average increase of 0.8009 in the Math score of the student. When we
have the writing score is 0 than there will be 11.583 score of the Math subject of the student. Part (d): Hypothesis test: Null Hypothesis: The slope of regression line is equal to Zero. Alternate Hypothesis: The slope of regression line is not equal to Zero. OR H 0 : β 1 = 0 H a : β 1 0 Level of Significance: α = 0.05 Test Statistics: t = b 1 Standard Error With n 2 degree of freedom t test Statistic = 42.511 Decision Rule: The p value is equal to 3.4 × 10 226 Conclusion: As the p value is smaller than the significance level 0.05 so we reject our Null Hypothesis and conclude that the slope of Regression line is not equal to Zero.