Lab 7 HW

docx

School

University of South Carolina *

*We aren’t endorsed by this school

Course

832

Subject

Economics

Date

Apr 3, 2024

Type

docx

Pages

10

Uploaded by CountSnow7057

Report
Jonathan Allman ECON 3720: Introduction to Econometrics University of Virginia Ron Michener LABORATORY ASSIGNMENT 7 Please remember to include your do file with your submission. As before, if your submission does not include a do file you will be docked 5 points. This lab uses a small subset of data from one of the most famous regression analyses of all time, “Equality of Educational Opportunity,” by James S. Coleman, et al., 1966, commonly referred to as the Coleman Report. The report was commissioned to determine the extent and possible causes of educational inequality in the United States. Its sample included more than 150,000 pupils in schools across the U.S. The names and definitions of variables drawn from the Coleman Report for 20 schools in the northeast and middle Atlantic states appear below: Student_score i = the mean verbal achievement score for 6 th graders in school i; Salary i = staff salaries per pupil in school i; Pct_wc_dad i = the percent of 6 th graders in school i whose fathers have white-collar jobs; Ses i = a composite of different measures of the socio-economic status of the families of 6 th graders at school i. Higher values of Status correspond to higher levels of socio-economic status. Teacher_score i = teachers’ average verbal score at school i; Mom_ed i = average years of education of mothers of 6 th graders at school i. The underlying theory says that the more and better resources provided by the school, measured by staff salaries and teachers’ test scores, the better the school’s pupil performance will be; and the more affluent and better educated the pupils’ families, measured by fathers’ jobs, mothers’ schooling, and other variables incorporated in Status, the better the school’s pupil performance will be. STEP I : Compute the summary statistics for these variables and include that information in your report. F23 1 F23
STEP II : Fit a regression explaining student_score with pct_wc_dad and teacher_score, that is: . Include the result in your report. i) How would you interpret ? (Be precise!) For every one unit increase in “pct_wc_dad” we would expect on average an increase in “student_score” of approximately 16.5 . ii) Test the following hypothesis using : What is the correct p-value? Do you accept or reject the null hypothesis? What do you conclude? P value = (0.0000/2) = 0.0000 We reject the null hypotheses at the 5% significance level. We can conclude from this that the percentage of fathers with white collar jobs has a positive impact on test score. In this regression model. F23 2 F23
STEP III : Fit a regression explaining student_score with teacher_score and mom_ed; that is, . Include the result in your report. i) How would you interpret ?(Be precise!) For every one unit increase in “teachers_score” we can expect on average an increase 1.092213 in “student_score”. ii) Test the following hypothesis using : What is the correct p-value? Do you accept or reject the null hypothesis? What do you conclude? Mom-ed Pvalue= (0.130/2) = 0.065, We fail to reject the null hypothesis that beta2 is less than or equal to 0 at the 5% level. We can conclude from this, that a teacher’s average verbale score on average may have little to negative impact on a student’s test score. In this regression model. F23 3 F23
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
STEP IV : Fit a regression explaining student_score with teacher_score, mom_ed, and pct_wc_dad; that is, . Include the result in your report. Compare the coefficients of like variables with earlier regressions. Are they the same? Teachers verbale score had its highest coefficient of 1.3 when regressed with percentage of fathers with white collar jobs and its lowest when regressed with mothers’ average educational level of 1.09. Both coefficients of the variables “Pct_wc_dad” and “mom_ed” decreased when regressed together with “teachers_score”. The variables have changed with each regression. ii) Test the following hypothesis using : What is the correct p-value? Do you accept or reject the null hypothesis? What do you conclude? “Mom_ed” Pvalue (0.828/2) = .414, With a p value of .414 we fail to reject the null hypothesis that Beta 2 is less than or equal to zero. We can conclude from this that the level of a mothers education may have little to potentially a negative impact on a student’s test scores. In this regression model. iii) Test the following hypothesis using : What is the correct p-value? Do you accept or reject the null hypothesis? What do you conclude? “pct_wc_dad” Pvalue (0.118/2) = 0.59, With a p value of 0.59 we fail to reject the null hypothesis that Beta 3 is less than or equal to zero. We can conclude from this that the percentage of fathers with white collar jobs has little to potentially negative effects on a student’s test scores. In this regression model. F23 4 F23
iv) Test the following hypothesis using : Do you accept or reject the null hypothesis? What do you conclude? With a P value of 0.0005 we reject the null hypothesis. From the F test we can conclude that the variables of percentage of fathers with white collar jobs and mothers’ level of education does affect our dependent variable of students test scores. v) Are your answers in Step IV, parts (ii) and (iii) consistent with the answers in Step II and Step III? Compare the standard error of the estimated coefficient of pct_wc_dad obtained in Step IV with the standard error obtained in Step II. Compare the standard error of the estimated coefficient of mom_ed obtained in Step IV with the standard error obtained in Step III. Does adding the other variable make these bigger or smaller? Why? What is going on? Coefficients Step IV std err Step III std err Step II Std err Teacher_scor e 0.6640534 0.6868373 0.6315986 Mom_ed 1.776376 0.6895506 NA Pct_wc_dad 0.0891847 NA 0.0320403 No, the answers from Step IV (ii&iii) and answers from Step, II and III are not consistent. The standard errors of each coefficient mom_ed and pct_wc_dad increased when regressed together. Adding the other variables in the regression makes the std err of each coefficient larger. Judging from the standard errors increasing when both variables are regressed along with the t-statistics of each variable respectively decreasing when regressed together additionally confidence intervals widening, the regression in Step IV appears to have multicollinearity, that is the two variables of mother’s education and fathers white collar work seem to have significant correlation. F23 5 F23
vi) Have STATA compute a correlation matrix for pct_wc_dad, teacher_score and mom_ed and also VIFs for the regression in Step IV. Does this confirm your conjecture in (v)? Explain. As seen in the correlation matrix the percentage of fathers with white collar jobs “pct_wc_dad” and mothers’ level of education “mom_ed” are highly correlated with each other their correlation is (0.9271) with 1 being perfectly correlated. STEP V : Fit a regression explaining student score with all the variables. . Include a copy in your report. F23 6 F23
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
i) Inspect the coefficients. Which coefficients have the expected sign? What are the correct p- values in light of your expected signs? Which coefficients are statistically significant at the 5 percent level? Coefficients on “pct_wc_dad”, “ses”, “teacher_score” , “ses”, “salary” all have the expected sign except for “mom_ed”. The correct p-values for each are as follows: Coefficient Pvalue “pct_wc_dad” (0.427/2) = 0.2135 “ses” (0.0000/2) = 0.0000 “teacher_score” (0.023/2) = 0.0115 “salary” (0.168/2) = 0.084 “mom_ed” 1-(0.378/2) = 0.811 Coefficients that are statistically significant at the 5% level are “ses” and “teacher_score”. ii) Let STATA compute a correlation matrix for the explanatory variables and the VIFs for the regression. Could the presence of severe imperfect multicollinearity explain any of the insignificant results you encountered in the previous step? Explain for each insignificant result. Imperfect multicollinearities can be seen in the VIF output with variables “pct_wc_dad” and “mom_ed” of 8.40 and 7.77 respectively. We can see in the regression including all explanatory variables that the standard errors are larger for each variable when regressed together vs separate. F23 7 F23
STEP VI : Consider the variables that might cause or be affected by multicollinearity in his model. Are they all theoretically necessary? Or can some be dropped from the equation? Note: Unless the variable is involved in the multicollinearity problem, assume it is necessary to include it in the equation for now. It seems that either mom_ed or pct_wc_dad could be dropped from the regression. i) Fit a regression omitting any variables that seem redundant on account of multicollinearity and include the output in your report. Include the VIFs for this regression. ii) Does this seem to have solved the multicollinearity problem? Are the results from this regression better than the results you have obtained in your other specifications? Address each of the four specification criteria detailed by Studenmund (p. 166) in your answer Omitting “pct_wc_dad” has improved adjusted R-squared and lowered the mean VIF . It does seem to have imporved the model. Theory: This variable makes sense to have in the model, since mothers are typically spending the most time around children it would stand to reason that mother’s education level would be relevant in the event of assisting children with their homework. t-Test: The estimated coefficient on variable “mom_ed” has changed in the expected direction in that should have a more positive impact on a student’s score rather than a lager negative impact. Adjusted R-squared: The adjusted R-squared as improved slightly from 0.8728 from the regression including all explanatory variables to 0.8756. Bias: The other variables coefficients do not change significantly when omitting “pct_wc_dad”. F23 8 F23
STEP VII : If you have not already done so, fit a regression explaining student score with ses, teacher_score, and salary; that is, . A copy of this regression should be included in your report. Consider the variable Salary. What arguments might be made for deleting this variable from the regression? What arguments might be made for retaining it? Keeping salary could indicate the quality and experience of teachers and the ability of the school/districts ability to retain high quality teachers this is theoretically sound. An argument for eliminating “salary” is the statistical insignificance; the Pvalue (0.162/2)= (0.81) making the coefficient not significant at the conventional 5% level. F23 9 F23
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
  • Access to all documents
  • Unlimited textbook solutions
  • 24/7 expert homework help
STEP VIII : Using the regression estimated in STEP VII, compute 99% confidence intervals (a) for the conditional mean and (b) for one new observation of student_score for a school whose 6 th grade class is one with ses = 8.6, teacher_score = 25.8, and salary = 2.74. (A) : CI for conditional mean: UPPER: 38.84518+2.9207816*(.5851438)= 40.554257 LOWER: 38.84518-2.9207816*(.5851438)= 37.136103 CI forecast (B): UPPER: 38.84518+2.9207816*(2.08137)= 44.924407 LOWER: 38.84518-2.9207816*(2.08137)= 32.765953 Input the following numbers into the tests and quizzes app for Lab 7. 1) In the regression estimated in Step II, what is the estimated standard error of Beta1-hat? .1658268 2) In the regression estimated in Step III, what is the estimated standard error of Beta2-hat? 3.122458 3) In Step IV, part (iv), what F value do you get in testing the joint hypothesis about Beta2 and Beta3? 12.67 4) In Step IV, part (vi), what is the sample correlation coefficient of mom_ed and pct_wc_dad? 0.9271 5) In Step V, part (i), if you are doing a greater-than test of the coefficient of salary, what is the correct p value? 0.084 6) In Step V, part (ii), what is the largest VIF for any of the variables? 8.40 7) In Step VIII, what is the upper bound of the 99% confidence interval for the conditional mean? 22.550665 8) In Step VIII, what is the upper bound of the 99% confidence interval for a new observation? -30.089769 F23 10 F23