Assigment 5 - ML
.docx
keyboard_arrow_up
School
St. John's University *
*We aren’t endorsed by this school
Course
602
Subject
Statistics
Date
Jan 9, 2024
Type
docx
Pages
24
Uploaded by lore150
Boston
The following objects are masked from ‘package:ISLR’:
Auto, Credit
> library(MASS)
> library(car)
Error in library(car) : there is no package called ‘car’
> library(boot)
> library(class)
> # import data and clean
> capstr <- na.omit(capstr)
> dim(capstr)
[1] 5634 14
> names(capstr)
[1] "gvkey" "year" "conm" [4] "spquality" "industry" "leverage" [7] "logassets" "rdta" "cashta" [10] "divta" "taxes" "capexta" [13] "roa" "leverageincrease"
> #inspect your data
> mean(capstr$leverage)
[1] 0.3427896
> median(capstr$leverage)
[1] 0.3278918
> sd(capstr$leverage)
[1] 0.2064969
> #histograms of variables of interest
> hist(capstr$leverage)
> hist(capstr$logassets)
> #linear regression of leverage
> lm.fit1 <- lm(leverage~logassets, data=capstr)
> summary(lm.fit1)
Call:
lm(formula = leverage ~ logassets, data = capstr)
Residuals:
Min 1Q Median 3Q Max -0.36522 -0.14258 -0.01556 0.11545 1.50374 Coefficients:
Estimate Std. Error t value Pr(>|t|) (Intercept) 0.250999 0.019425 12.922 < 2e-16 ***
logassets 0.010638 0.002228 4.773 1.86e-06 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.2061 on 5632 degrees of freedom
Multiple R-squared: 0.004029,
Adjusted R-squared: 0.003853 F-statistic: 22.79 on 1 and 5632 DF, p-value: 1.857e-06
> plot(lm.fit1)
Hit <Return> to see next plot: #training test split
Hit <Return> to see next plot: train<-(capstr$year<2018)
Hit <Return> to see next plot: test <- capstr[!train,]
Hit <Return> to see next plot: lm.fit3 <- lm(leverage~logassets, data=capstr, subset=train)
> mean((test$leverage-predict(lm.fit3, test))^2)
[1] 0.06209251
> #multiple regression
> lm.fit5 <- lm(leverage~logassets+capexta+rdta+taxes+spquality+divta+cashta, data=capstr, subset=train)
> summary(lm.fit5)
Call:
lm(formula = leverage ~ logassets + capexta + rdta + taxes + spquality + divta + cashta, data = capstr, subset = train)
Residuals:
Min 1Q Median 3Q Max -0.49552 -0.11788 -0.01290 0.09373 1.30077 Coefficients:
Estimate Std. Error t value Pr(>|t|) (Intercept) 0.215996 0.034405 6.278 4.08e-10 ***
logassets 0.009444 0.003231 2.923 0.003496 ** capexta -0.478974 0.130314 -3.676 0.000243 ***
rdta -0.872677 0.097419 -8.958 < 2e-16 ***
taxes -1.175891 0.591416 -1.988 0.046900 * spqualityA- 0.046032 0.021141 2.177 0.029550 * spqualityA+ -0.004828 0.028730 -0.168 0.866567 spqualityB 0.095494 0.018510 5.159 2.69e-07 ***
spqualityB- 0.093547 0.018623 5.023 5.47e-07 ***
spqualityB+ 0.110354 0.018303 6.029 1.91e-09 ***
spqualityC 0.194454 0.019839 9.801 < 2e-16 ***
spqualityD 0.384456 0.037757 10.182 < 2e-16 ***
divta 0.880130 0.171596 5.129 3.15e-07 ***
cashta -0.268161 0.034545 -7.763 1.24e-14 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.1842 on 2324 degrees of freedom
(4662 observations deleted due to missingness)
Multiple R-squared: 0.1952,
Adjusted R-squared: 0.1907 F-statistic: 43.35 on 13 and 2324 DF, p-value: < 2.2e-16
> mean((test$leverage-predict(lm.fit5, test))^2)
Error in eval(predvars, data, env) : object 'capexta' not found
> vif(lm.fit5)
Error in vif(lm.fit5) : could not find function "vif"
> #Add year and industry effects
> lm.fit6 <- lm(leverage~logassets+capexta+rdta+taxes+spquality+divta+cashta+factor(industry),
data=capstr,subset=train)
> summary(lm.fit6)
Call:
lm(formula = leverage ~ logassets + capexta + rdta + taxes + spquality + divta + cashta + factor(industry), data = capstr, subset = train)
Residuals:
Min 1Q Median 3Q Max -0.44866 -0.11462 -0.01504 0.09621 1.33151 Coefficients:
Estimate Std. Error t value
(Intercept) 0.2845002 0.0393679 7.227
logassets 0.0088708 0.0032623 2.719
capexta -0.2056597 0.1439220 -1.429
rdta -0.6576442 0.1036640 -6.344
taxes -0.4173570 0.5933910 -0.703
spqualityA- 0.0365232 0.0208753 1.750
spqualityA+ -0.0006213 0.0282817 -0.022
spqualityB 0.0792077 0.0184534 4.292
spqualityB- 0.0699337 0.0187556 3.729
spqualityB+ 0.0978148 0.0181447 5.391
spqualityC 0.1736531 0.0199029 8.725
spqualityD 0.3608247 0.0373088 9.671
divta 0.5490122 0.1776610 3.090
cashta -0.2229544 0.0346485 -6.435
factor(industry)Bus-eq -0.1053724 0.0189318 -5.566
factor(industry)Chem -0.0501774 0.0238176 -2.107
factor(industry)Durbl -0.0709785 0.0247872 -2.864
factor(industry)Enrgy -0.0899567 0.0386883 -2.325
factor(industry)Fin 0.0028891 0.0208928 0.138
factor(industry)Hlth -0.0724094 0.0214998 -3.368
factor(industry)Manuf -0.0588759 0.0232521 -2.532
factor(industry)NoDur -0.0597369 0.0255610 -2.337
factor(industry)Shops -0.0782346 0.0201570 -3.881
factor(industry)Telcm 0.0558288 0.0391838 1.425
Pr(>|t|) (Intercept) 6.69e-13 ***
logassets 0.006594 ** capexta 0.153149 rdta 2.68e-10 ***
taxes 0.481913 spqualityA- 0.080323 . spqualityA+ 0.982474 spqualityB 1.84e-05 ***
spqualityB- 0.000197 ***
spqualityB+ 7.73e-08 ***
spqualityC < 2e-16 ***
spqualityD < 2e-16 ***
divta 0.002024 ** cashta 1.50e-10 ***
factor(industry)Bus-eq 2.91e-08 ***
factor(industry)Chem 0.035248 * factor(industry)Durbl 0.004227 ** factor(industry)Enrgy 0.020149 * factor(industry)Fin 0.890031 factor(industry)Hlth 0.000770 ***
factor(industry)Manuf 0.011405 * factor(industry)NoDur 0.019523 * factor(industry)Shops 0.000107 ***
factor(industry)Telcm 0.154352 ---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.181 on 2314 degrees of freedom
(4662 observations deleted due to missingness)
Multiple R-squared: 0.2262,
Adjusted R-squared: 0.2186
F-statistic: 29.42 on 23 and 2314 DF, p-value: < 2.2e-16
> mean((test$leverage-predict(lm.fit6, test))^2)
Error in eval(predvars, data, env) : object 'capexta' not found
> #Logisitc Regression > #simple logistic
> glm.fit <- glm(leverageincrease~logassets, family=binomial, data=capstr)
> summary(glm.fit)
Call:
glm(formula = leverageincrease ~ logassets, family = binomial, data = capstr)
Deviance Residuals: Min 1Q Median 3Q Max -0.6956 -0.6610 -0.6528 -0.6454 1.8305 Coefficients:
Estimate Std. Error z value Pr(>|z|) (Intercept) -1.65714 0.23668 -7.002 2.53e-12 ***
logassets 0.02735 0.02707 1.010 0.312 ---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 5552.1 on 5633 degrees of freedom
Residual deviance: 5551.1 on 5632 degrees of freedom
AIC: 5555.1
Number of Fisher Scoring iterations: 4
> #multiple logistic
> glm.fit <- glm(leverageincrease~logassets+capexta+rdta+taxes+spquality+divta+cashta, family=binomial, data=capstr)
> summary(glm.fit)
Call:
glm(formula = leverageincrease ~ logassets + capexta + rdta + taxes + spquality + divta + cashta, family = binomial, data = capstr)
Deviance Residuals: Min 1Q Median 3Q Max
-1.1002 -0.6740 -0.6298 -0.5868 1.9781 Coefficients:
Estimate Std. Error z value Pr(>|z|) (Intercept) -1.973817 0.300742 -6.563 5.27e-11 ***
logassets 0.045184 0.028067 1.610 0.10742 capexta 3.492695 1.102446 3.168 0.00153 ** rdta 1.130905 0.742120 1.524 0.12754 taxes -0.132474 5.417023 -0.024 0.98049 spqualityA- -0.194172 0.183344 -1.059 0.28957 spqualityA+ 0.270778 0.238488 1.135 0.25621 spqualityB -0.064246 0.154990 -0.415 0.67850 spqualityB- 0.008132 0.155342 0.052 0.95825 spqualityB+ -0.159035 0.154894 -1.027 0.30454 spqualityC 0.143413 0.163208 0.879 0.37956 spqualityD 0.363117 0.294022 1.235 0.21683 divta 1.339934 0.922925 1.452 0.14655 cashta 0.170486 0.286316 0.595 0.55155 ---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 5552.1 on 5633 degrees of freedom
Residual deviance: 5520.1 on 5620 degrees of freedom
AIC: 5548.1
Number of Fisher Scoring iterations: 4
> #Get predictions on whether market will go up or down
> # First code everything as Down
> glm.pred = rep("0",5634)
> # Recode probabilities greater than .5 as up
> glm.probs<-predict(glm.fit,capstr,type="response")
> glm.pred[glm.probs>.5]="1"
> table(glm.pred,capstr$leverageincrease)
glm.pred 0 1
0 4538 1095
1 0 1
> mean(glm.pred==capstr$leverageincrease)
[1] 0.8056443
> #Fit a logistic model on training dataset
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Questions
ALEKS - Christian Seither - Learn X +
●
https://www-awu.aleks.com/alekscgi/x/Isl.exe/10_u-IgNslkr7j8P3jH-1BjnuwZGiweF
Here are the meanings of some of the symbols that appear in the statements belo
means "is a subset of."
C means "is a proper subset of."
Z means "is not a subset of."
Øis the empty set.
For each statement, decide if it is true or false.
.
SETS
Identifying true statements involving subsets and proper subsets
Statement
(11, 13, 15)
Jxplanation Check
(11, 12, 13, 14, 15}
(7,9) Ø
(c. d. f. g} = (d, f}
(q, r, w) C (q, r, w}
True False
O
0
O
O
O
Search
X
hp
arrow_forward
You must not import any other modules.
#DO NOT CHANGE THE CONTENT OF THIS CODE BOX
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.patches as patches
from timeit import timeit
import seaborn as sns
from itertools import permutations, combinations
arrow_forward
Helppppppp::/$;$;&(&(&(&;&;&;&&&;&;&(&(&;&;&(&(&(&;@;&((&;:&:@(&(&;&@;
arrow_forward
Create a multiplicative Cayley table for Q
arrow_forward
Part 2
Multiple Choice. For each of the following strings of symbols, write in the
corresponding space:
"A" if the string of symbols is not a WFF in SL;
"B" if the string of symbols is WFF in SL, but it is ambiguous (assuming parentheses dropping
conventions); and
" if the string of symbols is a WFF that is unambiguous (given parentheses dropping
conventions).
1. P->(Qv(C&S))
2. (P→Q→R)&~S
3. ((FG)→(TvW&Z))
4. T&P→>(GvS)
5. (R>Rv~K
arrow_forward
A bag contains four red marbles, three green ones, one lavender one, three yellows, and two orange marbles. HINT [See Example 7.]
How many sets of five marbles include at least three red ones?
-----------------------------sets
arrow_forward
What is (A∪B)c where A={5,11,3,8,9} , B={99,5,11,2,4} and Universal Set ={5,16,4,12,2,11,99,3,1,9,8}?
arrow_forward
Use Venn diagram
arrow_forward
Let A = {1, 3, 5, 7, 9}, B = {3, 6, 9}, and C = {2, 4, 6, 8}. Find each of the following. (Express your answer in set-roster notation or write EMPTY or Ø for the empty set.)
(a) AUB =
(b) ANB =
(c) AUC =
(d) AnC =
(е) А - В %3D
(f) B - A =
(9) BUC =
(h) BnC =
arrow_forward
Which set of symbols represents high validity?
A
B
C
A and C
arrow_forward
Find the indicated set if
A = {1,2,3,4,5,6,7}, B = {2,4,6,8} and C = {7,8,9,10)
(a) A ⋃ B ⋃ C (b) A ⋂ B ⋂ C
arrow_forward
(DO NOT use any other symbol except numbers in the answer slot )
Trump, Kim, Justin, Modi, Putin and Sefu are at a round table dinner.[The answers of the following questions are integers.Put the answer only.Don’t use space,decimal points,comma etc]
Now, It is known that if any two of Trump, Kim and Putin sit beside each other (while sitting in a linear fashion), the dinner may end up in a disaster. In how many ways can they sit so that the dinner may NOT end in a disaster (i.e. how many ways can they sit so that none of Trump, Kim or Putin sits adjacent to each other)?
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Elements Of Modern Algebra
Algebra
ISBN:9781285463230
Author:Gilbert, Linda, Jimmie
Publisher:Cengage Learning,
College Algebra (MindTap Course List)
Algebra
ISBN:9781305652231
Author:R. David Gustafson, Jeff Hughes
Publisher:Cengage Learning
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL
Algebra: Structure And Method, Book 1
Algebra
ISBN:9780395977224
Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. Cole
Publisher:McDougal Littell
Related Questions
- ALEKS - Christian Seither - Learn X + ● https://www-awu.aleks.com/alekscgi/x/Isl.exe/10_u-IgNslkr7j8P3jH-1BjnuwZGiweF Here are the meanings of some of the symbols that appear in the statements belo means "is a subset of." C means "is a proper subset of." Z means "is not a subset of." Øis the empty set. For each statement, decide if it is true or false. . SETS Identifying true statements involving subsets and proper subsets Statement (11, 13, 15) Jxplanation Check (11, 12, 13, 14, 15} (7,9) Ø (c. d. f. g} = (d, f} (q, r, w) C (q, r, w} True False O 0 O O O Search X hparrow_forwardYou must not import any other modules. #DO NOT CHANGE THE CONTENT OF THIS CODE BOX import matplotlib.pyplot as plt import numpy as np import matplotlib.patches as patches from timeit import timeit import seaborn as sns from itertools import permutations, combinationsarrow_forwardHelppppppp::/$;$;&(&(&(&;&;&;&&&;&;&(&(&;&;&(&(&(&;@;&((&;:&:@(&(&;&@;arrow_forward
- Create a multiplicative Cayley table for Qarrow_forwardPart 2 Multiple Choice. For each of the following strings of symbols, write in the corresponding space: "A" if the string of symbols is not a WFF in SL; "B" if the string of symbols is WFF in SL, but it is ambiguous (assuming parentheses dropping conventions); and " if the string of symbols is a WFF that is unambiguous (given parentheses dropping conventions). 1. P->(Qv(C&S)) 2. (P→Q→R)&~S 3. ((FG)→(TvW&Z)) 4. T&P→>(GvS) 5. (R>Rv~Karrow_forwardA bag contains four red marbles, three green ones, one lavender one, three yellows, and two orange marbles. HINT [See Example 7.] How many sets of five marbles include at least three red ones? -----------------------------setsarrow_forward
- What is (A∪B)c where A={5,11,3,8,9} , B={99,5,11,2,4} and Universal Set ={5,16,4,12,2,11,99,3,1,9,8}?arrow_forwardUse Venn diagramarrow_forwardLet A = {1, 3, 5, 7, 9}, B = {3, 6, 9}, and C = {2, 4, 6, 8}. Find each of the following. (Express your answer in set-roster notation or write EMPTY or Ø for the empty set.) (a) AUB = (b) ANB = (c) AUC = (d) AnC = (е) А - В %3D (f) B - A = (9) BUC = (h) BnC =arrow_forward
- Which set of symbols represents high validity? A B C A and Carrow_forwardFind the indicated set if A = {1,2,3,4,5,6,7}, B = {2,4,6,8} and C = {7,8,9,10) (a) A ⋃ B ⋃ C (b) A ⋂ B ⋂ Carrow_forward(DO NOT use any other symbol except numbers in the answer slot ) Trump, Kim, Justin, Modi, Putin and Sefu are at a round table dinner.[The answers of the following questions are integers.Put the answer only.Don’t use space,decimal points,comma etc] Now, It is known that if any two of Trump, Kim and Putin sit beside each other (while sitting in a linear fashion), the dinner may end up in a disaster. In how many ways can they sit so that the dinner may NOT end in a disaster (i.e. how many ways can they sit so that none of Trump, Kim or Putin sits adjacent to each other)?arrow_forward
arrow_back_ios
arrow_forward_ios
Recommended textbooks for you
- Elements Of Modern AlgebraAlgebraISBN:9781285463230Author:Gilbert, Linda, JimmiePublisher:Cengage Learning,College Algebra (MindTap Course List)AlgebraISBN:9781305652231Author:R. David Gustafson, Jeff HughesPublisher:Cengage LearningHolt Mcdougal Larson Pre-algebra: Student Edition...AlgebraISBN:9780547587776Author:HOLT MCDOUGALPublisher:HOLT MCDOUGAL
- Algebra: Structure And Method, Book 1AlgebraISBN:9780395977224Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. ColePublisher:McDougal Littell
Elements Of Modern Algebra
Algebra
ISBN:9781285463230
Author:Gilbert, Linda, Jimmie
Publisher:Cengage Learning,
College Algebra (MindTap Course List)
Algebra
ISBN:9781305652231
Author:R. David Gustafson, Jeff Hughes
Publisher:Cengage Learning
Holt Mcdougal Larson Pre-algebra: Student Edition...
Algebra
ISBN:9780547587776
Author:HOLT MCDOUGAL
Publisher:HOLT MCDOUGAL
Algebra: Structure And Method, Book 1
Algebra
ISBN:9780395977224
Author:Richard G. Brown, Mary P. Dolciani, Robert H. Sorgenfrey, William L. Cole
Publisher:McDougal Littell