Lab 7 HW
.docx
keyboard_arrow_up
School
University of South Carolina *
*We aren’t endorsed by this school
Course
832
Subject
Economics
Date
Apr 3, 2024
Type
docx
Pages
10
Uploaded by CountSnow7057
Jonathan Allman
ECON 3720: Introduction to Econometrics
University of Virginia
Ron Michener
LABORATORY ASSIGNMENT 7
Please remember to include your do file with your submission. As before, if your submission does not include a do file you will be docked 5 points.
This lab uses a small subset of data from one of the most famous regression analyses of all time, “Equality of Educational Opportunity,” by James S. Coleman, et al., 1966, commonly referred to as the Coleman Report. The report was commissioned to determine the extent and possible causes of educational inequality in the United States. Its sample included more than 150,000 pupils in schools across the U.S. The
names and definitions of variables drawn from the Coleman Report for 20 schools in the northeast and middle Atlantic states appear below:
Student_score
i
= the mean verbal achievement score for 6
th
graders in school i;
Salary
i = staff salaries per pupil in school i;
Pct_wc_dad i = the percent of 6
th
graders in school i whose fathers have white-collar jobs;
Ses
i = a composite of different measures of the socio-economic status of the families of 6
th
graders at school
i. Higher values of Status correspond to higher levels of socio-economic status.
Teacher_score
i = teachers’ average verbal score at school i;
Mom_ed
i = average years of education of mothers of 6
th
graders at school i.
The underlying theory says that the more and better resources provided by the school, measured by staff salaries and teachers’ test scores, the better the school’s pupil performance will be; and the more affluent and better educated the pupils’ families, measured by fathers’ jobs, mothers’ schooling, and other variables incorporated in Status, the better the school’s pupil performance will be.
STEP I
: Compute the summary statistics for these variables and include that information in your report. F23
1
F23
STEP II
: Fit a regression explaining student_score with pct_wc_dad and teacher_score, that is:
.
Include the result in your report.
i)
How would you interpret
? (Be precise!) For every one unit increase in “pct_wc_dad” we would expect on average an increase in “student_score” of approximately 16.5
. ii) Test the following hypothesis using
:
What is the correct p-value? Do you accept or reject the null hypothesis? What do you conclude? P value = (0.0000/2) = 0.0000 We reject the null hypotheses at the 5% significance level. We can conclude from this that the percentage of fathers with white collar jobs has a positive impact on test score. In this regression model. F23
2
F23
STEP III
: Fit a regression explaining student_score with teacher_score and mom_ed; that is,
.
Include the result in your report.
i) How would you interpret
?(Be precise!) For every one unit increase in “teachers_score” we can expect on average an increase 1.092213 in “student_score”. ii) Test the following hypothesis using :
What is the correct p-value? Do you accept or reject the null hypothesis? What do you conclude?
Mom-ed Pvalue= (0.130/2) = 0.065, We fail to reject the null hypothesis
that beta2 is less than or equal to 0 at the 5% level. We can conclude from this, that a teacher’s average verbale score on average may have little to negative impact on a student’s test score. In this regression model. F23
3
F23
STEP IV
: Fit a regression explaining student_score with teacher_score, mom_ed, and pct_wc_dad; that is,
.
Include the result in your report.
Compare the coefficients of like variables with earlier regressions. Are they the same?
Teachers verbale score had its highest coefficient of 1.3 when regressed with percentage of fathers with white collar jobs and its lowest when regressed with mothers’ average educational level of 1.09. Both coefficients of the variables “Pct_wc_dad” and “mom_ed” decreased when regressed together with “teachers_score”. The variables have changed with each regression. ii) Test the following hypothesis using
:
What is the correct p-value? Do you accept or reject the null hypothesis? What do you conclude?
“Mom_ed” Pvalue (0.828/2) = .414, With a p value of .414 we fail to reject the null hypothesis that Beta
2 is less than or equal to zero. We can conclude from this that the level of a mothers education may have little to potentially a negative impact on a student’s test scores. In this regression model. iii) Test the following hypothesis using
:
What is the correct p-value? Do you accept or reject the null hypothesis? What do you conclude?
“pct_wc_dad” Pvalue (0.118/2) = 0.59, With a p value of 0.59 we fail to reject the null hypothesis that Beta
3
is less than or equal to zero. We can conclude from this that the percentage of fathers with white collar jobs has little to potentially negative effects on a student’s test scores. In this regression model. F23
4
F23
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Questions
(2)What would the consequence be for a regression model if theerrors were not homoscedastic?
arrow_forward
The controller of Chittenango Chain Company believes that the identification of the variable and fixed components of the firm’s costs will enable the firm to make better planning and control decisions. Among the costs the controller is concerned about is the behavior of indirect-materials cost. She believes there is a correlation between machine hours and the amount of indirect materials used.A member of the controller’s staff has suggested that least-squares regression be used to determine the cost behavior of indirect materials. The regression equation shown below was developed from 40 pairs of observations.S = $200 + $9H
where
S
=
Total monthly cost of indirect materials
H
=
Machine hours per month
The estimated cost of indirect materials if 900 machine hours are to be used during a month is $8,300 (Assume that 900 falls within the relevant range for this cost equation.)
The high and low activity levels during the past four years, as measured by machine hours, occurred…
arrow_forward
What is the solution?
arrow_forward
An economist uses regression analysis to determine the relationship between used car price (y) and the age of a car (x).
The analysis resulted in the following equation:
Y = 30,000 - 500*X
The above equation implies that an increase of
O 1 year in the age of the car is associated with an increase of $500 in the price of the car
O 1 year in the age of the car is associated with a decrease in $500 in the price of the car
O $500 in the price of the car is associated with an increase of 5 years in the age of the car
5 years in the age of the car is associated with a decrease of $100 in the price of the car
arrow_forward
Suppose there are 2 quantitative free variables and 1 variable
non free category. Non-free variables have 2 categories,
namely 1 for the success category and 1 for the fail category.
The method used to create models that describe relationships
between variables is a binary logistic regression model.
Perform parameter recovery for the model. Explain the stage
until the alleged value is obtained
arrow_forward
In a regression problem with 1 output variable and with a total number of 100 possible input variables, what is the number of all possible models with three input variables?
arrow_forward
Enumerate the 10 assumptions of the classical linear regression model (CLRM) and discuss its importance in econometrics analysis.
arrow_forward
Water is being poured into a large, cone-shaped
cistern. The volume of water, measured in cm³, is
reported at different time intervals, measured in
seconds. A regression analysis was completed and
is displayed in the computer output.
Regression Analysis: cuberoot (Volume) versus Time
Predictor
Coef
SE Coef
Constant
-0.006 0.00017
-35.294
0.000
Time
0.640
0.000018
35512.6
0.000
s=0.030
R-Sq=1.000 R-sq (adj)=1.000
What is the equation of the least-squares regression
line?
Volume = 0.640 - 0.006(Time)
Volume = 0.640 - 0.006(Time)
Volume = -0.006 + 0.640(Time)
Volume = - 0.006 + 0.640(Time?)
arrow_forward
In multiple regression model: what is it means for a variable to be significant? Explain the meaning of the significant variable.
arrow_forward
Consider the following data on production volume (x) and total cost (y) for a particular manufacturing operation.
Excel File: data14-29.xlsx
Production Volume (units)
Total Cost ($)
400
4,000
450
5,000
550
5,400
600
5,900
700
6,400
750
7,000
The estimated regression equation is ŷ = 1246.67 +7.6x.
Use a = 0.05 to test whether the production volume is significantly related to the total cost.
Complete the ANOVA table. Enter all values with nearest whole number, except the F test statistic (to 2 decimals) and the p-value (to 4 decimals).
Source of
Degrees of
Sum
Mean
Variation
Freedom
of Squares
Square
p-value
Regression
Error
Total
What is your conclusion?
- Select your answer -
arrow_forward
Please answer fast
arrow_forward
Define Interpretation of coefficients in polynomial regression models?
arrow_forward
Please provide me with the correct answer, along with the calculations, and do not use any AI tools
arrow_forward
All the regression assumptions lie on the residuals, for both simple and multiple regression.
True or False?
arrow_forward
Consider the following ANOVA table for a multiple regression model relating housing prices (in thousands of dollars) to the number of bedrooms in the house and the
size of the lot on which the house was built (in square feet). There were 90 total observations.
Estimated Price = 24,838.74 + 2339.88 (Bedrooms) + 0.2685 (Lot Size)
ANOVA
df
SS
MS
F
Regression 2 382,993.4921
Residual 87 627,839.9910
Significance F
191,496.7461 26.5358 1.0066E-09
7216.5516
Total
89 1,010,833.4831
Compute the adjusted coefficient of determination for this regression model. Round your answer to four decimal places.
arrow_forward
Introductory Econometris: A Modern Approach
4th edition, Chapter 17 Problem 1CE: What is
the command in R in order to run the "White
heteroskedasticity-consistent standard errors
& covariance"? In other words, I would like to
run the new regression with robust standard
errors in it.
arrow_forward
Investigate what factors determine the number of times a person logs into Facebook per week. It is argued that these four factors are important: number of friends, age in years, whether the person is employed, and whether the student has a Twitter account. The function is:
FACEBOOK LOGIN=f(FRIENDS,AGE,EMPLOYED,TWITTER)
With this functions and variables in mind:
Perform necessary tests and analysis to determine the validity of the regression attached in the below table and function provided
The tests and analysis must be broken down into two sections namely: tests that are possible with the given regression’s information and tests that should be conducted but are not possible with the given information.
For the tests that are possible please conduct them at a 5% significance level and for those that are not possible only mention their names without any further details
arrow_forward
Discuss the FIVE (5) importance of adding error term in the regression model.
arrow_forward
What is difference between regression model, and estimated regression equation?
arrow_forward
N5
arrow_forward
True or False
For a linear regression model including only an intercept, the OLS estimator of that intercept is equal to the sample mean of the independent variable.
arrow_forward
XYZ Company's accountant is estimating next period's total overhead costs (Y). She performed three regression analyses, the first is
based on direct labor hours (DLH), the second is based on machine hours (Mhr), and the third is based on quantity produced (Q). The
results were: [Y=$95,000 + $9×DLH; R-square = 0.85]; [Y= $120,000 + $5xMhr; R-square = 0.15]; [Y=190,000+2Q; R-square=D0.45]. How
much of the variations on the overhead costs is explained by the quantity produced (Q)?
Select one:
O a. 15%
O b. None of the answers given
C. 55%
O d. 85%
e. 45%
arrow_forward
In regression analysis, a common metric used in assessing the quality of the model being used to fit the data is known as the R-squared coefficient.
Explain the R-squared coefficient.
What is the difference between the R-squared and adjusted R-squared coefficients?
arrow_forward
2. Consider a two variable regression model, which satisfies all the Gauss Markov
assumptions except that the error variance is proportional to X² i.e.E(u?) = o²X?
Y₁ = B₁ + B₂X₁ + Ui
How would you obtain the best linear unbiased estimates from the above regression.
arrow_forward
a) The R? should not be used to choose the best econometric model specification in
multiple regression models. True or False
arrow_forward
Suppose the Sherwin-Williams Company has developed the following multiple regression model, with paint sales Y (x 1,000 gallons) as the dependent
variable and promotional expenditures A (x $1,000) and selling price P (dollars per gallon) as the independent variables.
Y=a+B₂A+BpP+ €
Now suppose that the estimate of the model produces following results: a = 344.585, b = 0.102 bp = -13.397, 8 = 0.155, 8bp = 4.312,
R² = 0.813, and F-statistic = 11.361. Note that the sample consists of 10 observations.
According to the estimated model, holding all else constant, a $1,000 increase in promotional expenditures
gallons. Similarly, a $1 increase in the selling price
sales by approximately.
sales by approximately
gallons.
Which of the independent variables (if any) appears to be statistically significant (at the 0.05 level) in explaining paint sales? Check all that apply.
Selling price (P)
Promotional expenditures (A)
What proportion of the total variation in sales is explained by the regression…
arrow_forward
Refrigerator prices are affected by characteristics such as whether or not the refrigerator is on sale, whether or not it is listed as a Sub-
Zero brand, the number of doors (one door or two doors), and the placement of the freezer compartment (top, side, or bottom). The
table below shows the regression output from a regression model using the natural log of price as the dependent variable. The model
was developed by the Bureau of Labor Statistics.
Variable
Coefficient Standard Error t Statistic
5.4841
Intercept
0.13081
41.92
Sale price
-0.0733
0.0234
-3.13
Sub-Zero brand
1.1196
0.1462
7.66
0.06956
0.005351
13.00
Total capacity (in cubic feet)
Two-door, freezer on bottom
Two-door, side freezer
0.04657
0.08085
0.58
Two-door, freezer on top
0.03596
-9.55
Base
-0.3432
-0.7096
-0.8820
One door with freezer
0.1310
-5.42
-5.92
One door, no freezer
0.1491
(a) Write the regression model, being careful to exclude the base indicator variable. (Negative amounts should be indicated by a
minus…
arrow_forward
Conduct a regression analysis
arrow_forward
Conduct a regression analysis in Excel using the following data:
X
Y
12
40
23
50
40
59
33
58
18
45
a) What is the value of b0? Include 1 decimal place in your answer.
b) What is the value of b1? Include 2 decimal places in your answer.
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Managerial Economics: Applications, Strategies an...
Economics
ISBN:9781305506381
Author:James R. McGuigan, R. Charles Moyer, Frederick H.deB. Harris
Publisher:Cengage Learning
Related Questions
- (2)What would the consequence be for a regression model if theerrors were not homoscedastic?arrow_forwardThe controller of Chittenango Chain Company believes that the identification of the variable and fixed components of the firm’s costs will enable the firm to make better planning and control decisions. Among the costs the controller is concerned about is the behavior of indirect-materials cost. She believes there is a correlation between machine hours and the amount of indirect materials used.A member of the controller’s staff has suggested that least-squares regression be used to determine the cost behavior of indirect materials. The regression equation shown below was developed from 40 pairs of observations.S = $200 + $9H where S = Total monthly cost of indirect materials H = Machine hours per month The estimated cost of indirect materials if 900 machine hours are to be used during a month is $8,300 (Assume that 900 falls within the relevant range for this cost equation.) The high and low activity levels during the past four years, as measured by machine hours, occurred…arrow_forwardWhat is the solution?arrow_forward
- An economist uses regression analysis to determine the relationship between used car price (y) and the age of a car (x). The analysis resulted in the following equation: Y = 30,000 - 500*X The above equation implies that an increase of O 1 year in the age of the car is associated with an increase of $500 in the price of the car O 1 year in the age of the car is associated with a decrease in $500 in the price of the car O $500 in the price of the car is associated with an increase of 5 years in the age of the car 5 years in the age of the car is associated with a decrease of $100 in the price of the cararrow_forwardSuppose there are 2 quantitative free variables and 1 variable non free category. Non-free variables have 2 categories, namely 1 for the success category and 1 for the fail category. The method used to create models that describe relationships between variables is a binary logistic regression model. Perform parameter recovery for the model. Explain the stage until the alleged value is obtainedarrow_forwardIn a regression problem with 1 output variable and with a total number of 100 possible input variables, what is the number of all possible models with three input variables?arrow_forward
- Enumerate the 10 assumptions of the classical linear regression model (CLRM) and discuss its importance in econometrics analysis.arrow_forwardWater is being poured into a large, cone-shaped cistern. The volume of water, measured in cm³, is reported at different time intervals, measured in seconds. A regression analysis was completed and is displayed in the computer output. Regression Analysis: cuberoot (Volume) versus Time Predictor Coef SE Coef Constant -0.006 0.00017 -35.294 0.000 Time 0.640 0.000018 35512.6 0.000 s=0.030 R-Sq=1.000 R-sq (adj)=1.000 What is the equation of the least-squares regression line? Volume = 0.640 - 0.006(Time) Volume = 0.640 - 0.006(Time) Volume = -0.006 + 0.640(Time) Volume = - 0.006 + 0.640(Time?)arrow_forwardIn multiple regression model: what is it means for a variable to be significant? Explain the meaning of the significant variable.arrow_forward
- Consider the following data on production volume (x) and total cost (y) for a particular manufacturing operation. Excel File: data14-29.xlsx Production Volume (units) Total Cost ($) 400 4,000 450 5,000 550 5,400 600 5,900 700 6,400 750 7,000 The estimated regression equation is ŷ = 1246.67 +7.6x. Use a = 0.05 to test whether the production volume is significantly related to the total cost. Complete the ANOVA table. Enter all values with nearest whole number, except the F test statistic (to 2 decimals) and the p-value (to 4 decimals). Source of Degrees of Sum Mean Variation Freedom of Squares Square p-value Regression Error Total What is your conclusion? - Select your answer -arrow_forwardPlease answer fastarrow_forwardDefine Interpretation of coefficients in polynomial regression models?arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Managerial Economics: Applications, Strategies an...EconomicsISBN:9781305506381Author:James R. McGuigan, R. Charles Moyer, Frederick H.deB. HarrisPublisher:Cengage Learning
Managerial Economics: Applications, Strategies an...
Economics
ISBN:9781305506381
Author:James R. McGuigan, R. Charles Moyer, Frederick H.deB. Harris
Publisher:Cengage Learning