PS#5
.pdf
keyboard_arrow_up
School
California Lutheran University *
*We aren’t endorsed by this school
Course
IDS575
Subject
Economics
Date
Apr 3, 2024
Type
Pages
11
Uploaded by SuperHumanCrabPerson1153
Q1 Model Selection
30 Points
Q1.1
4 Points
You will perform cross-validation on a dataset with 100 examples. Your goal is to robustly measure the validation error to approximate out-of-
sample performance of your model.
If using a 10-fold cross-validation, you need to compute the validation error N1 times. To compute individual errors, you should train your model with a training data of size N2, and test the model on a validation data of size N3. Your final validation error will be the average of N1 individual validation errors.
What are the appropriate numbers for N1, N2, N3?
Q1.2
4 Points
Which of the following cross-validation methods may not be suitable for a very large dataset with hundreds of thousands of examples?
N1 = 10, N2 = 90, N3 = 10
N1 = 1, N2 = 90, N3 = 10
N1 = 10, N2 = 10, N3 = 90
N1 = 1, N2 = 10, N3 = 90
N1 = 10, N2 = 100, N3 = 100
N1 = 1, N2 = 100, N3 = 100
Q1.3
4 Points
To use holdout validation for your classification problem, you are going to randomly split your supervised dataset into training, validation, and test partitions. Assume your dataset is sufficiently large. Select all correct
Q1.4
4 Points
Now you will run 10-fold cross-validation for training k-NN. For each of candidate k values, you train k-NN on all dataset but one of the 10 folds, then measuring an approximate validation error on the examples in that heldout fold.
When you have 5 different candidate k values to decide the best model, you will train k-NN total N1 times. The performance of individual k-fold cross-validation
Leave-one-out cross-validation
Holdout validation
All of the above
Some partitions may consist of substantially more difficult- or
easier-to-predict cases.
Some partitions may contain a larger or smaller proportion
among different label classes.
Training performance could be decreased due to holding out
subsets of data for validation and test.
Measuring out-of-sample performance could become less
accurate due to the random split.
models (with a specific k value) will be evaluated by the mean of N2 validation errors.
Q1.5
4 Points
To report and launch your prediction system, now you are to choose the final model (with the best performing k) given the results from the 10-fold cross-validation in Q1.4. Choose the best convention to come up with the final validation error and the final decision boundary.
Q1.6
5 Points
N1=10, N2=5
N1=5, N2=10
N1=45, N2=10
N2=45, N1=50
N1=50, N2=10
N1=10, N2=50
Pick the with the lowest mean validation error as the best
model; Report that lowest mean validation error as the final
validation error; Launch the decision boundary trained for -NN as
it is.
k
∗
k
∗
Pick the as the closest candidate to the weighted average of 5
candidate k values (where the weights are given by each k-NN's
accuracy); Report the weighted average of mean validation errors
as the final validation error; Launch the decision boundary trained
for -NN as it is.
k
∗
k
∗
Pick the as with the lowest mean validation error as the best
model; Report that lowest mean validation error as the final
validation error; Launch the decision boundary by retraining -NN
on the entire dataset (including all 10 folds).
k
∗
k
∗
For spam classification, you have 100 emails in the validation data. When your trained hypothesis classifies 90 emails with their correct labels, what interval does the true error lie in with 95% confidence?
(0.0412, 0.1588)
Q1.7
5 Points
Assume you evenly split the data D into 3 disjoint subsets , and . You run cross-validation to determine the better model between and . When you achieve the following test errors: , , , , , . Which model we should pick?
Q2 Model Assessment
70 Points
Q2.1
5 Points
Choose all correct ones:
h
^
Err
( )
p
h
^
D
1,
D
2
D
3
M
1
M
2
Err
(
) =
D
1
h
^
M
1
0.32
Err
(
) =
D
2
h
^
M
1
0.41
Err
(
) =
D
3
h
^
M
1
0.15
Err
(
) =
D
1
h
^
M
2
0.24
Err
(
) =
D
2
h
^
M
2
0.38
Err
(
) =
D
3
h
^
M
2
0.17
M1
M2
Indifferent
M2 could be better but may not be significantly.
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Questions
Vary the “Capacity” of Storage from 70 to 94 in increments of 4. For each capacity level, perform 30 simulations and report the overall mean service level for Purchase Requests. (Recall that we assume the bakery has adequate capacity to supply these various amounts to the store because we have set “Number of objects per arrival” to 200.) Also report the overall mean cycle time of simulated loaves in Storage. What level of inventory do you recommend to achieve a service level of .99?
arrow_forward
Explain probability and nonprobability samplingtechniques.
arrow_forward
How does the particle used in FastSLAM differ from the one used in Monte Carlo localization?
arrow_forward
Given that Z is distributed as a standard normal random variable, what is Pr(Z > -0.04)? Round your answer to three decimal places, e.g. 0.251.
arrow_forward
A property owner is faced with a choice of:
A large-scale investment to improve her flats. This could produce a substantial pay-off in terms of increased revenue net of costs but will require an investment of 1.4 million pesos. After extensive market research it is considered that there is a 40% chance that a pay-off of 2.5million will be obtained, but there is a 60% chance that it will be only 800,000 pesos.
A smaller scale project to re-decorate her premises. At 500,000 pesos this is less costly but will produce a lower pay-off. Research data suggests a 30% chance of a gain of one million pesos but a 70% chance of being only 500,000 pesos.
Continuing the present operation without change. It will cost nothing but neither will it produce any pay-off. Clients will be unhappy and it will become harder to rent the flats out when they become free.
What is the best alternative? Use decision tree analysis.
arrow_forward
What is sampling? Explain the differences between probability and nonprobability samples and identify the various typesof each
arrow_forward
Hotpot stoves use a standard oven insulation. To test its effectiveness
they take random samples from the production line and heat the
ovens selected to 400°C, noting the time taken to cool to 350°C after
switching off. For a sample of 8 ovens the times in minutes are:
15.7, 14.8, 14.2, 16.1, 15.3, 13.9, 17.2, 14.9
They decide to explore a cheaper insulation, and using this on a
sample of 9 the times taken for the same temperature drop are:
13.7, 14.1, 14.7, 15.4, 15.6, 14.4, 12.9, 15.1, 14.0
Are the firm justified in asserting there is no real evidence of a
different rate of heat loss? Obtain a 95 per cent confidence limit for
the difference in median heat loss (a) with and (b) without a
normality assumption. Comment critically on any differences between
your conclusions.
arrow_forward
The cost C(Q) of producing a quantity Q of widgets to satisfy demand isC(Q) = 4000+20Q, but the quantity demanded is random. If the mean and standarddeviation of demand are 500 and 200, respectively, then what are the mean andstandard deviation of costs?
arrow_forward
Explain Standard errors for TSLS (two stage least squares)?
arrow_forward
The assumption that is required to shows the efficiency of the OLS estimator, consistency
and unbiasedness is:
i) Cov(u,,u-;)
{ A0
ii) Var(u,) =o?
iii) E(u,) = 0
iv) u, ~ N(0,0²)
a) (i), (ii) and (iii) only
b) (i) and (iii) only
c) (ii) and (iv) only
d) (i), (ii) (iii) and (iv)
Answer
O A
OD
arrow_forward
Answer this question. Part (b).
arrow_forward
(c) The price of a 14 oz cup of coffee in Sunshine cafés is a random variable C
with an expected value of $3.00 and a standard deviation of $1. At any given
shop, the price doesn't vary, but it varies independently across shops. Tomorrow
morning, you plan to buy two 14 oz cups of coffee from the same shop while your
friend plans on buying two from two different shops.
(i) How much do you expect to spend on coffee tomorrow and with what standard
deviation?
(ii) How much do you expect your friend to spend and with what standard
deviation?
(iii) Who do you think has a better idea and why?
arrow_forward
A continuous random variable X is uniformly distributed over interval [2, 5]. What is the standard deviation
of X?
Answer Choices:
a. 0.97
b. 0.87
c. 0.45
d. 0.65
arrow_forward
The estimated demand function for ice cream at a popular beach on a summer day is given by Q = 200 - 4.5p, where p is measured in euros. What is the predicted quantity if p = €2.00? If the actual quantity demanded is 195, what is the residual? Suggest at least two unobserved variables incorporated in the random error.
arrow_forward
The branch manager of an international bank in Kuala Lumpur, Malaysia, has received a memorandum from senior executives at the head office of the bank instructing the manager to ensure that the average queuing time for customers waiting to see a cashier is no more than 5 minutes. Since receiving this directive, the manager has been informally checking queuing times and is very confident that the average time customers spend waiting to see a cashier is currently 5 minutes or less. You have now been brought in to undertake an audit of queuing times to check that they are in accordance with the senior executives’ directive. State the null and alternative hypotheses you will be using in this instance.
arrow_forward
J 1
arrow_forward
None
arrow_forward
Are these conditions must be met for a Valid instrument?it must be
(1) correlated with the included endogenous variable and
(2) exogenous.
arrow_forward
The mean was found to be u = 70 so now we need to find the standard deviation o. The standard deviation is calculated as follows where n is the sample size, 350,
and p is the probability of a success, 0.2. Find the standard deviation.
V np(1 – p)
O =
V 350(
)(1 – 0.2)
arrow_forward
26. Expedia would like to test that the average number of
Southwest Airlines flights which arrive on time (u) is 90. A
random sample of 140 flights found that 119 arrived on
time. Suppose a = 0.05. Then the critical value for the test
%3D
(Za/2) is
a) 1.28
b) 1.96
c) 2.33
d) 2.58
O b
O a
Oc
arrow_forward
Kier, the industry analyst of Globe, wants to determine the propensity of Major Internet companies toward risk. He was able to determine the utility distribution of Globe, PLDT and Converge. For Globe, If the expected payoff of a venture is a loss of 50,000, the utility value is 0.00, if a loss of 25,000, the utility value is .2, if breakeven, the utility value is .5, if gain of 25,000 .8 and if gain of 50,000 utility value is 1. For PLDT, if loss of 50,000 utility value is 0, if loss of 25,000 utility value is .1, breakeven is .4, if a gain of 25,000, utility value is .7 and if gain of 50,000 utility value is 1. For Converge, if loss of 50,000, utility value is 0, if loss of 25,000, utility value is .3 breakeven is .6, if gain of 25,000, utility value is .9 and gain of 50,000, utility value is 1. What is the propensity to risk of the three internet companies? Explain your graph.
arrow_forward
A large consumer goods company ran a television advertisement for one of its soap products. On the basis of a survey that was conducted, probabilities were
assigned to the following events.
B = individual purchased the product
S = individual recalls seeing the advertisement
BNS = individual purchased the product and recalls seeing the advertisement
The probabilities assigned were P(B) = 0.20, P(S) = 0.40, and P(BnS) = 0.12.
%3D
a. What is the probability of an individual's purchasing the product given that the individual recalls seeing the advertisement (to 1 decimal)?
Does seeing the advertisement increase the probability that the individual will purchase the product?
- Select your answer-
As a decision maker, would you recommend continuing the advertisement (assuming that the cost is reasonable)?
- Select your answer -
b. Assume that individuals who do not purchase the company's soap product buy from its competitors. What would be your estimate of the company's market
share (to the…
arrow_forward
pls help asap on both
arrow_forward
When real estate apents sell their own, rather than dients, houses, they leave the houses on the market for a longer time (10 davs longer on
average) and wind up with better prices.
Suppose that real estate apents earn a commission equal to 3% of the sale value of each house that they help sel. Suppose a real estate apent has
an offer on a house of s200,000. The agent can pet a 4% higher offer with more effort and by leaving the house on the market for a few more weeks.
If the agent sells the house now, he wil ean a commission of
|. By keeping the house on the market and expending more etfort to
sel the house, the apert can earn a commission of
Thus, the agent paine E
for the extra effort and time of
keeping the house on the market.
Supoose a real estate agent has an offer on a house that he owns. The agent has a current offer of s200,000. but can pt a 4% higher offer with
more effort and by leaving the house on the market for a few more weeks, Assume that all of the commission is paid…
arrow_forward
Let z be the standard normal random variable.
1
Find
P(z > 1.45)
a
0.0452
b
0.0531
c
0.0625
d
0.0735
arrow_forward
What is the value of Y if C = 200, a = 20, and b = 0.9?
arrow_forward
If the estimator is consistent and the asymptotic variance is smaller than all other consistent
estimators of then Ô is asymptotic efficient.
O True
False
arrow_forward
Please find the 3 questions
arrow_forward
(a)
Tell what each of the residual plots to the right indicates about the appropriateness of the linear model that was fit to the data.
X-values
(a) Choose the best answer for residuals plot (a).
O A. The curved pattern in the residuals plot indicates that the linear model is not appropriate. The relationship is not linear.
O B. The fanned pattern indicates that the linear model is not appropriate. The model's predicting power decreases as the values of the explanatory variable increases.
O C. The scattered residuals plot indicates an appropriate linear model.
(b) Choose the best answer for residuals plot (b).
O A. The curved pattern in the residuals plot indicates that the linear model is not appropriate. The relationship is not linear.
O B. The scattered residuals plot indicates an appropriate linear model.
O C. The fanned pattern indicates that the linear model is not appropriate. The model's predicting power increases as the values of the explanatory variable increases.
(c) Choose…
arrow_forward
12) Melquiades is worried about his grades from the microeconomics course, according to the
rules for grading, the final grade is obtained as the minimum between the grade for the midterm
test (x) and the final test (y).
The utility function of Melquiades will depend on the grades obtained from those two tests
Now, Melquiades has researched that the average student needs to study 24 minutes
an additional point in each test (tests are graded from 0 to 100)
obtain
arrow_forward
Assume that the random variable x is normally distributed with
mean u = 70 and standard deviation o = 10. Find P(90 < x < 91)
0.9500
0.9951
0.0049
0.4500
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Principles of Economics (12th Edition)
Economics
ISBN:9780134078779
Author:Karl E. Case, Ray C. Fair, Sharon E. Oster
Publisher:PEARSON
Engineering Economy (17th Edition)
Economics
ISBN:9780134870069
Author:William G. Sullivan, Elin M. Wicks, C. Patrick Koelling
Publisher:PEARSON
Principles of Economics (MindTap Course List)
Economics
ISBN:9781305585126
Author:N. Gregory Mankiw
Publisher:Cengage Learning
Managerial Economics: A Problem Solving Approach
Economics
ISBN:9781337106665
Author:Luke M. Froeb, Brian T. McCann, Michael R. Ward, Mike Shor
Publisher:Cengage Learning
Managerial Economics & Business Strategy (Mcgraw-...
Economics
ISBN:9781259290619
Author:Michael Baye, Jeff Prince
Publisher:McGraw-Hill Education
Related Questions
- Vary the “Capacity” of Storage from 70 to 94 in increments of 4. For each capacity level, perform 30 simulations and report the overall mean service level for Purchase Requests. (Recall that we assume the bakery has adequate capacity to supply these various amounts to the store because we have set “Number of objects per arrival” to 200.) Also report the overall mean cycle time of simulated loaves in Storage. What level of inventory do you recommend to achieve a service level of .99?arrow_forwardExplain probability and nonprobability samplingtechniques.arrow_forwardHow does the particle used in FastSLAM differ from the one used in Monte Carlo localization?arrow_forward
- Given that Z is distributed as a standard normal random variable, what is Pr(Z > -0.04)? Round your answer to three decimal places, e.g. 0.251.arrow_forwardA property owner is faced with a choice of: A large-scale investment to improve her flats. This could produce a substantial pay-off in terms of increased revenue net of costs but will require an investment of 1.4 million pesos. After extensive market research it is considered that there is a 40% chance that a pay-off of 2.5million will be obtained, but there is a 60% chance that it will be only 800,000 pesos. A smaller scale project to re-decorate her premises. At 500,000 pesos this is less costly but will produce a lower pay-off. Research data suggests a 30% chance of a gain of one million pesos but a 70% chance of being only 500,000 pesos. Continuing the present operation without change. It will cost nothing but neither will it produce any pay-off. Clients will be unhappy and it will become harder to rent the flats out when they become free. What is the best alternative? Use decision tree analysis.arrow_forwardWhat is sampling? Explain the differences between probability and nonprobability samples and identify the various typesof eacharrow_forward
- Hotpot stoves use a standard oven insulation. To test its effectiveness they take random samples from the production line and heat the ovens selected to 400°C, noting the time taken to cool to 350°C after switching off. For a sample of 8 ovens the times in minutes are: 15.7, 14.8, 14.2, 16.1, 15.3, 13.9, 17.2, 14.9 They decide to explore a cheaper insulation, and using this on a sample of 9 the times taken for the same temperature drop are: 13.7, 14.1, 14.7, 15.4, 15.6, 14.4, 12.9, 15.1, 14.0 Are the firm justified in asserting there is no real evidence of a different rate of heat loss? Obtain a 95 per cent confidence limit for the difference in median heat loss (a) with and (b) without a normality assumption. Comment critically on any differences between your conclusions.arrow_forwardThe cost C(Q) of producing a quantity Q of widgets to satisfy demand isC(Q) = 4000+20Q, but the quantity demanded is random. If the mean and standarddeviation of demand are 500 and 200, respectively, then what are the mean andstandard deviation of costs?arrow_forwardExplain Standard errors for TSLS (two stage least squares)?arrow_forward
- The assumption that is required to shows the efficiency of the OLS estimator, consistency and unbiasedness is: i) Cov(u,,u-;) { A0 ii) Var(u,) =o? iii) E(u,) = 0 iv) u, ~ N(0,0²) a) (i), (ii) and (iii) only b) (i) and (iii) only c) (ii) and (iv) only d) (i), (ii) (iii) and (iv) Answer O A ODarrow_forwardAnswer this question. Part (b).arrow_forward(c) The price of a 14 oz cup of coffee in Sunshine cafés is a random variable C with an expected value of $3.00 and a standard deviation of $1. At any given shop, the price doesn't vary, but it varies independently across shops. Tomorrow morning, you plan to buy two 14 oz cups of coffee from the same shop while your friend plans on buying two from two different shops. (i) How much do you expect to spend on coffee tomorrow and with what standard deviation? (ii) How much do you expect your friend to spend and with what standard deviation? (iii) Who do you think has a better idea and why?arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Principles of Economics (12th Edition)EconomicsISBN:9780134078779Author:Karl E. Case, Ray C. Fair, Sharon E. OsterPublisher:PEARSONEngineering Economy (17th Edition)EconomicsISBN:9780134870069Author:William G. Sullivan, Elin M. Wicks, C. Patrick KoellingPublisher:PEARSON
- Principles of Economics (MindTap Course List)EconomicsISBN:9781305585126Author:N. Gregory MankiwPublisher:Cengage LearningManagerial Economics: A Problem Solving ApproachEconomicsISBN:9781337106665Author:Luke M. Froeb, Brian T. McCann, Michael R. Ward, Mike ShorPublisher:Cengage LearningManagerial Economics & Business Strategy (Mcgraw-...EconomicsISBN:9781259290619Author:Michael Baye, Jeff PrincePublisher:McGraw-Hill Education
Principles of Economics (12th Edition)
Economics
ISBN:9780134078779
Author:Karl E. Case, Ray C. Fair, Sharon E. Oster
Publisher:PEARSON
Engineering Economy (17th Edition)
Economics
ISBN:9780134870069
Author:William G. Sullivan, Elin M. Wicks, C. Patrick Koelling
Publisher:PEARSON
Principles of Economics (MindTap Course List)
Economics
ISBN:9781305585126
Author:N. Gregory Mankiw
Publisher:Cengage Learning
Managerial Economics: A Problem Solving Approach
Economics
ISBN:9781337106665
Author:Luke M. Froeb, Brian T. McCann, Michael R. Ward, Mike Shor
Publisher:Cengage Learning
Managerial Economics & Business Strategy (Mcgraw-...
Economics
ISBN:9781259290619
Author:Michael Baye, Jeff Prince
Publisher:McGraw-Hill Education