SA3_RSolution_Econ140

.pdf

School

University of California, Berkeley *

*We aren’t endorsed by this school

Course

140

Subject

Economics

Date

Feb 20, 2024

Type

pdf

Pages

Uploaded by JusticeHawk19074

ECON 140: Section 3 (OLS) Question 3: Dummy variables regression Load required libraries Load dataset # Set working directory getwd () ## [1] "/Users/jonathanold/Dropbox/ECON_140_SPRING_2024/Section Assignments" setwd ( "/Users/jonathanold/Library/CloudStorage/GoogleDrive-jonathan_old@berkeley.edu/My Drive/_Berkeley # Load dataset mexico_data = read.csv ( "../Datasets/Mexico.csv" ) head (mexico_data) ## year state sector sex age ind_lang educ_years hrs_wk inc_m USDist_km ## 1 2010 1 Services 1 30 0 12 48 3600 600.6932 ## 2 2010 1 Services 0 16 0 9 98 4286 557.6143 ## 3 2010 1 Agriculture 0 19 0 10 24 1286 616.0118 ## 4 2010 1 Manufacturing 0 29 0 12 45 4286 550.4415 ## 5 2010 1 Services 0 38 0 9 40 4600 597.9345 ## 6 2010 1 Services 0 50 0 13 30 8000 561.4097 # Create summary statistics table, export as LaTeX stargazer ( data= mexico_data, type= "latex" , title= "Summary Statistics" , summary.stat = c ( "n" , "mean" , "sd" , "min" , "median" , "max" ), out= "Summary Table.tex" ) # To display table, use \input{"Summary Table.tex"} Table 1: Summary Statistics Statistic N Mean St. Dev. Min Median Max year 1,000 2,010.000 0.000 2,010 2,010 2,010 state 1,000 17.963 7.961 1 17 32 sex 1,000 0.337 0.473 0 0 1 age 1,000 36.527 13.524 12 35 80 ind_lang 1,000 0.151 0.358 0 0 1 educ_years 1,000 8.295 4.452 0 9 19 hrs_wk 1,000 45.447 20.039 1 48 168 inc_m 1,000 4,890.467 13,662.750 8 3,429 400,000 USDist_km 1,000 702.982 274.254 6.609 738.099 1,348.003 1

Get means and difference in means # How to create a dummy variable using an ifelse condition mexico_data = mexico_data %>% mutate ( rich = ifelse (inc_m > 3000 , 1 , 0 )) # Using tidyR syntax to generate mean of income by indigenous language mexico_data %>% group_by (ind_lang) %>% summarize ( mean (inc_m)) ## # A tibble: 2 x 2 ## ind_lang mean(inc_m) ## <int> <dbl> ## 1 0 5309. ## 2 1 2537. # Using base R syntax to generate mean of income by indigenous language mean1 = mean (mexico_data $ inc_m[mexico_data $ ind_lang == 1 ]) mean0 = mean (mexico_data $ inc_m[mexico_data $ ind_lang == 0 ]) mean_diff = mean1 - mean0 mean_diff ## [1] -2771.968 mean1 ## [1] 2537.066 mean0 ## [1] 5309.034 The average monthly income in the group of indigenous language speakers is 2537, and in the group of non-indigenous language speakers, it is 5309. The difference between the two is -2772. Run OLS regression # Running and outputting linear regression ols_results = summary ( lm (inc_m ~ ind_lang , data= mexico_data)) ols_results ## ## Call: ## lm(formula = inc_m ~ ind_lang, data = mexico_data) ## ## Residuals: ## Min 1Q Median 3Q Max ## -5301 -2738 -1452 291 394691 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 5309.0 467.9 11.347 <2e-16 *** ## ind_lang -2772.0 1204.1 -2.302 0.0215 * ## --- ## Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1 ## ## Residual standard error: 13630 on 998 degrees of freedom ## Multiple R-squared: 0.005282, Adjusted R-squared: 0.004286 ## F-statistic: 5.3 on 1 and 998 DF, p-value: 0.02153 2

# Compare mean difference to OLS coefficient ols_results $ coefficients[ 2 , 1 ] ## [1] -2771.968 mean_diff ## [1] -2771.968 # Create regression table. Include with command "\input{reg1.tex}", written in text reg1 = lm (inc_m ~ ind_lang , data= mexico_data) stargazer (reg1, title= "Indigenous language and wages" , omit.stat= c ( "LL" , "ser" , "f" , "adj.rsq" ), no.space= TRUE , header= TRUE , align= FALSE , type= "latex" , out= "reg1.tex" ) Table 2: Indigenous language and wages Dependent variable: inc_m ind_lang - 2,771.968 ** (1,204.101) Constant 5,309.034 *** (467.898) Observations 1,000 R 2 0.005 Note: * p < 0.1; ** p < 0.05; *** p < 0.01 We see that the OLS regression is very useful to summarize the data. The constant/intercept gives the average monthly income in the group where ind_lang is zero, and the coefficient on ind_lang gives the mean difference between ind_lang==1 and ind_lang==0. This is always true if we run a regression with one dummy variable on the right hand side. It is also true when we run a regression with multiple dummy variables, as long as we put in enough regressors to describe all the categories present in the data. Question 4: Wages over the life-cycle setwd ( "/Users/jonathanold/Library/CloudStorage/GoogleDrive-jonathan_old@berkeley.edu/My Drive/_Berkeley dataset = read.csv ( file= "../Datasets/wages.csv" ) plot (dataset $ age, dataset $ wage_yearly) 3

Your preview ends here

Eager to read complete document? Join bartleby learn and gain access to the full version