exam1at430sol
pdf
School
University of Illinois, Urbana Champaign *
*We aren’t endorsed by this school
Course
425
Subject
Statistics
Date
Feb 20, 2024
Type
Pages
10
Uploaded by ConstableValor10277
STAT 425 Exam 1 @ 4:30 pm
October 4, 2023, 4:30pm
Name:
SOLUTIONS
Netid: _________________________
This is an 80 minute handwritten exam. There are 5 problems, each worth 10 points. Do not
start working until your proctor tells you to start. Your head must be visible to your proctor
on Zoom along with your screen and the work in front of you. You may use a calculator but
not the computer or R or the internet for your work.
To work on the exam you can either:
1. Print out the exam and do your handwritten work on the exam itself; or
2. View the exam on your screen and do work on separate sheets of paper. Clearly label
your work as to which problem number (1,2,3..) and part (a,b,c..) you are solving; or
3.
Do work on a blank file or pdf of the exam on a tablet, and upload your work file from
the tablet.
In this case the proctor must be able to see your tablet.
Scanning and uploading your exam:
After you finish, scan/photograph each page and
upload into Moodle in the same way you upload assignment files.
You are allowed two one-sided 8.5 by 11 inch sheets of notes for yourself. Scan
and upload after the exam.
1
Problem 1.
(3 parts) Data will be collected in the form
(
x
1
, y
1
)
,
(
x
2
, y
2
)
, . . . ,
(
x
n
, y
n
)
, where
x
i
is the
i
th value of a
fixed, nonrandom
explanatory variable, and
y
i
is the corresponding
random response. Consider the model
y
i
=
β
0
+
β
1
x
i
+
e
i
,
i
= 1
, . . . , n
for unknown parameters
β
0
and
β
1
and random errors
e
1
, . . . , e
n
that are independent with
mean zero and variances equal to an unknown constant
σ
2
.
The ordinary least squares estimators for
β
0
and
β
1
are given by
ˆ
β
0
= ¯
y
−
¯
x
ˆ
β
1
and
ˆ
β
1
=
QQQQQQQ
n
i
=1
(
x
i
−
¯
x
)(
y
i
−
¯
y
)
QQQQQQQ
n
i
=1
(
x
i
−
¯
x
)
2
,
where
¯
x
=
1
n
n
YYYYYYY
i
=1
x
i
and
¯
y
=
1
n
n
YYYYYYY
i
=1
y
i
.
(a)
(3 pts) If
x
1
= 3
.
5
and
x
2
= 5
.
0
, find
E
(
y
2
)
−
E
(
y
1
)
in terms of the model parameters.
E
(
y
2
)
−
E
(
y
1
) = (
β
0
+ 5
β
1
)
−
(
β
0
+ 3
.
5
β
1
) = 1
.
5
β
1
(b)
(3 pts) Find an explicit expression for
E
(
¯
y
)
in terms of the model parameters and
predictor variables.
E
(¯
y
) =
E
1
n
n
YYYYYYY
i
=1
y
i
=
1
n
n
YYYYYYY
i
=1
E
(
y
i
) =
1
n
n
YYYYYYY
i
=1
(
β
0
+
β
1
x
i
)
=
β
0
+
β
1
1
n
n
YYYYYYY
i
=1
x
i
=
β
0
+
β
1
¯
x
(c)
(4 pts) After the data are collected we find
n
= 20
,
QQQQQQQ
20
i
=1
(
x
i
−
¯
x
)
2
= 50
,
QQQQQQQ
20
i
=1
(
y
i
−
¯
y
)
2
= 33
,
and
QQQQQQQ
20
i
=1
(
y
i
−
ˆ
y
i
)
2
= 9
.
6
, where
ˆ
y
1
, . . . ,
ˆ
y
20
are the fitted values for LS regression of
y
on
x
.
Based on these results, show how to calculate the standard error for
ˆ
β
1
, plugging in all the
relevant numbers. You do not have to complete the calculation.
se
(
ˆ
β
1
) =
ˆ
σ
rrrrrrr
QQQQQQQ
20
i
=1
(
x
i
−
¯
x
)
2
=
wwwwwww
vvvvvvv
vvvvvvv
uuuuuuu
QQQQQQQ
20
i
=1
(
y
i
−
ˆ
y
i
)
2
/
(20
−
2)
QQQQQQQ
20
i
=1
(
x
i
−
¯
x
)
2
=
ttttttt
9
.
6
/
18
50
2
Problem 2.
(4 parts) Data on fuel consumption were collected for each of the 50 states
and Washington D.C. for a total sample size of
n
= 51
.
The variables are gasoline
Tax
(cents/gallon),
Fuel
consumption per 1000 pop. over 16,
Dlic
(Licensed Drivers per 1000
population over 16), and
logMiles
(log
10
miles of highway in the state). The following linear
model was fit to the data.
Fuel
=
β
0
+
β
1
Tax
+
β
2
Dlic
+
β
3
logMiles
+
error
The results of fitting a linear model of this form are summarized below.
##
## Call:
## lm(formula = Fuel ~ Tax + Dlic + logMiles, data = df)
##
## Residuals:
##
Min
1Q
Median
3Q
Max
## -171.13
-48.91
5.34
41.90
193.25
##
## Coefficients:
##
Estimate Std. Error t value Pr(>|t|)
## (Intercept) -166.926
168.544
-0.99
0.32705
## Tax
-3.999
2.171
-1.84
0.07175
## Dlic
0.536
0.135
3.96
0.00025
## logMiles
79.445
21.972
3.62
0.00073
##
## Residual standard error: 69.4 on 47 degrees of freedom
## Multiple R-squared:
0.427,
Adjusted R-squared:
0.391
## F-statistic: 11.7 on 3 and 47 DF,
p-value: 7.66e-06
(a)
(2 pts) Based on the results, what is the proportion of total variance explained by the
model?
Multiple R-squared = 0.427
(b)
(2 pts) Based on the fitted model, estimate the expected Fuel consumption per 1000
population for a state with the following profile:
##
Tax
Dlic logMiles
##
20 782.8
5.1
Set up the calculation with all relevant numbers. You do not need to complete the calculation.
-166.926 + (20)(-3.999) + (782.8)(0.536) + (5.1)(79.445)
3
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
(c)
(3 pts) Consider the
F-Statistic
test results given on the last line of the summary. State
the null hypothesis
H
0
and alternative hypothesis
H
A
for this test. Express the hypotheses
in terms of the unknown parameters
β
0
, β
1
, β
2
, β
3
, σ
2
.
H
0
:
β
1
=
β
2
=
β
3
= 0
H
A
:
at least one of
β
1
, β
2
, β
3
̸
= 0
Equivalently,
H
A
:
β
1
̸
= 0
or
β
2
̸
= 0
or
β
3
̸
= 0
.
(d)
(3 pts) Based on the model summary and mathematical notation above, provide the
t value and p-value for testing the null hypothesis
H
0
:
β
1
= 0
against the alternative
H
a
:
β
1
̸
= 0
. Also give the degrees of freedom for this test.
This is the coefficient t test for
Tax
. From the model summary we have
t value
=
−
1
.
84
,
p-value
= 0
.
07175
The degrees of freedom = 47 = degrees of freedom for residual standard error.
4
Problem 3.
(3 parts) Consider a model of the form
y
=
X
β
+
e
, where
X
is an
n
×
p
full
rank matrix (its columns are linearly independent),
y
and
e
are
n
×
1
, and
β
is
p
×
1
. Assume
X
is a fixed (non-random) matrix,
E
(
e
) =
0
, and
cov
(
e
) =
σ
2
I
. The least square estimator
of
β
is
ˆ
β
= (
X
T
X
)
−
1
X
T
y
. The projection matrix or “hat” matrix is
H
=
X
(
X
T
X
)
−
1
X
T
.
(a)
(3 pts) Show that
X
ˆ
β
=
Hy
.
X
ˆ
β
=
X
(
X
T
X
)
−
1
X
T
y
=
Hy
(b)
(4 pts) If
ˆ
y
is the vector of least square fitted values, show that
Cov
(ˆ
y
) =
σ
2
H
.
Method 1:
Cov
(ˆ
y
) =
Cov
(
X
ˆ
β
)
=
X
Cov
(
ˆ
β
)
X
T
=
X
(
σ
2
(
X
T
X
−
1
))
X
T
=
σ
2
X
(
X
T
X
)
−
1
X
T
=
σ
2
H
Method 2:
Cov
(ˆ
y
) =
Cov
(
Hy
)
=
H
Cov
(
y
)
H
T
=
H
(
σ
2
I
)
H
(
H
is symmetric
)
=
σ
2
HH
=
σ
2
H
(c)
(3 pts) Explain why
var
(
ˆ
y
i
) =
σ
2
h
i
for
i
= 1
,
2
, . . . , n
, where
h
i
is the
(
i, i
)
diagonal
element of
H
.
The diagonal elements of
Cov
(
ˆ
y
) =
σ
2
H
are the variances of
ˆ
y
1
, . . . ,
ˆ
y
n
. These diagonal
elements are
σ
2
h
1
, . . . , σ
2
h
n
.
5
Problem 4.
(4 parts) Data were collected on variables x1, x2, x3, x4, x5, and y. Two models
were compared using the
anova
function. Here are the results:
## Analysis of Variance Table
##
## Model 1: y ~ x1 + x5
## Model 2: y ~ x1 + x2 + x3 + x4 + x5
##
Res.Df
RSS Df Sum of Sq
F
Pr(>F)
## 1
47 755.48
## 2
44 641.26
3
114.21 2.6122 0.06315
(a)
(2 pts) Find the value for residual sum of squares,
∥
y
−
ˆ
y
∥
2
=
QQQQQQQ
n
i
=1
(
y
i
−
ˆ
y
i
)
2
, for Model
2, and also find the residual degrees of freedom for this model.
RSS
2
= 641
.
26
with
44
degrees of freedom.
(b)
(2 pts) Suppose the errors in Model 2 have expectations equal to zero, are uncorrelated,
and have constant variance
σ
2
. Calculate an unbiased estimate of
ˆ
σ
2
.
ˆ
σ
2
=
RSS
2
df
2
=
641
.
26
44
= 14
.
57
(c)
(3 pts) State the null and alternative hypotheses being tested by the F statistic in the
Analysis of Variance table given above.
Several ways to state the hypotheses:
H
0
:
The model including only x1 and x5 is adequate.
H
A
:
At least one of the variables x2, x3, x4 must also be included in the model.
H
0
:
β
x
2
=
β
x
3
=
β
x
4
= 0
.
H
A
:
β
x
2
̸
= 0
or
β
x
3
̸
= 0
or
β
x
4
̸
= 0
.
H
0
:
Model
y
∼
x1 + x5
H
A
:
Model
y
∼
x1 + x2 + x3 + x4 + x5
6
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
(d)
(3 pts) (i) Give the numerical values for the degrees of freedom of the F-test in the table.
(ii) Is the null hypothesis rejected at level
α
= 0
.
05
? How do you know?
(i) Degrees of freedom (numerator, denominator) = (3, 44).
(ii)
H
0
is not rejected at level 0.05 because
p
= 0
.
06315
.
7
Problem 5.
(4 parts) Data on fuel consumption were collected for each of the 50 states
and Washington D.C. for a total sample size of
n
= 51
.
The variables are gasoline
Tax
(cents/gallon),
Fuel
consumption per 1000 pop. over 16,
Dlic
(Licensed Drivers per 1000
population over 16), and
logMiles
(log
10
miles of highway in the state). The following linear
model was fit to the data.
Fuel
=
β
0
+
β
1
Tax
+
β
2
Dlic
+
β
3
logMiles
+
error
Below is a plot of standardized residuals versus diagonals of the “hat” matrix after fitting
the above model by ordinary least squares. The points are labeled with the two letter state
abbreviations.
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
-3
-2
-1
0
1
2
3
Leverage (h_i)
Standardized Residual
AL
AK
AZ
AR
CA
CO
CT
DE
DC
FL
GA
HI
ID
IL
IN
IA
KS
KY
LA
ME
MD
MA
MI
MN
MS
MO
MT
NE
NV
NH
NJ
NM
NY
NC
ND
OH
OK
OR PA
RI
SC
SD
TN
TX
UT
VT
VA
WA
WV WI
WY
(a)
(3 pts) We know that in general
QQQQQQQ
n
i
=1
h
i
=
p
, where
p
is the number of columns of the
design matrix
X
including the constant column for the intercept. Based on the information
given, (i) find the average (sample mean) of the leverages in the data, and (ii) identify the
states that have leverage more than twice the average value (identify by their labels).
(i) Average leverage = 4/51 = 0.0784
(ii)
Twice the average = 8/51 = 0.157. According to the graph, the states above this level
are VT, GA, HI, RI, AK, and DC.
8
(b)
(3 pts) Based on the above plot, state which of the following would have the largest
Cook’s Distance and explain why: AK, RI, WY.
AK has the largest Cook’s Distance because Cook’s Distance increases as a function of
absolute standardized residual and leverage. Specifically:
•
AK has larger Cook’s Distance than RI because it has higher leverage and absolute
standardized residual.
•
AK also has higher Cook’s Distance than WY because, while the AK absolute stan-
dardized residual is very similar that of WY, the AK leverage is more than twice that
of WY.
(c)
(2 pts) The plot below shows quantiles of the standardized residuals plotted against
quantiles of the standard normal distribution. What potential problem with the linear model
assumptions is this plot meant to diagnose? Does the plot below suggest any problem with
the model? Explain briefly.
-2
-1
0
1
2
-3
-2
-1
0
1
2
3
Normal Q-Q Plot
Theoretical Quantiles
Sample Quantiles
This type of plot is for checking whether the error distribution is non-normal. The plot
suggests the distribution might have heavy tails or outliers, with several values at each end
deviating from the straight line trend of the inner quantiles.
9
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
(d)
(2 pts) Consider Box-Cox models of the form
g
λ
(
Fuel
) =
β
0
+
β
1
Tax
+
β
2
Dlic
+
β
3
logMiles
+
error
where
g
λ
(
y
) =
999
=====
;;;;
y
λ
−
1
λ
,
if
λ
̸
= 0
log
e
(
y
)
,
if
λ
= 0
.
The plot below shows log-likelihood versus the value of
λ
, where for each
λ
we obtain a set of
coefficient estimates for the regression parameters specific to that value of
λ
by the method
of maximum likelihood. For each
λ
the result is the same as if we performed ordinary least
squares regression of
g
λ
(
Fuel
)
on the predictor variables.
-1
0
1
2
3
4
2
4
6
8
10
12
λ
log-Likelihood
95%
Consider the null hypothesis that
λ
= 1
. Based on the results given, should we reject or fail
to reject this hypothesis at the at level
α
= 0
.
05
? Why or why not?
The 95% confidence interval for
λ
in the graph includes the value
λ
= 1
so we fail to reject
the null hypothesis.
10
Related Documents
Recommended textbooks for you

College Algebra (MindTap Course List)
Algebra
ISBN:9781305652231
Author:R. David Gustafson, Jeff Hughes
Publisher:Cengage Learning
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage

Algebra: Structure And Method, Book 1
Algebra
ISBN:9780395977224
Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. Cole
Publisher:McDougal Littell
Recommended textbooks for you
- College Algebra (MindTap Course List)AlgebraISBN:9781305652231Author:R. David Gustafson, Jeff HughesPublisher:Cengage LearningAlgebra & Trigonometry with Analytic GeometryAlgebraISBN:9781133382119Author:SwokowskiPublisher:CengageAlgebra: Structure And Method, Book 1AlgebraISBN:9780395977224Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. ColePublisher:McDougal Littell

College Algebra (MindTap Course List)
Algebra
ISBN:9781305652231
Author:R. David Gustafson, Jeff Hughes
Publisher:Cengage Learning
Algebra & Trigonometry with Analytic Geometry
Algebra
ISBN:9781133382119
Author:Swokowski
Publisher:Cengage

Algebra: Structure And Method, Book 1
Algebra
ISBN:9780395977224
Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. Cole
Publisher:McDougal Littell