Math

StatisticsQ&A Library(Math)Let D be the distribution over the data points (x, y), and let H be thehypothesis class, in which one would like to find a function f that has a small expected loss L(f) by minimizing the empirical loss Lˆ(f). A few definitions/terminologies:• The best function among all (measurable) functions is called Bayes hypothesis:f∗ = arg inffL(f).• The best function in the hypothesis class is denoted asfopt = arg inff∈HL(f)• The function that minimizes the empirical loss in the hypothesis class is denoted asˆfopt = arg inff∈HLˆ(f)• The function output by the algorithm is denoted as ˆf. (It can be different from ˆfopt since the optimization may not find the best solution.)• The difference between the loss of f∗ and fopt is called approximation error:xapp = L(fopt) − L(f∗)which measures the error introduced in building the model/hypothesis class.• The difference between the loss of fopt and ˆfopt is called estimation error:xest = L(ˆfopt) − L(fopt)which measures the error introduced by using finite data to approximate the distribution D.• The difference between the loss of ˆfopt and ˆf is called optimization error:xopt = L(ˆf) − L(ˆfopt)which measures the error introduced in optimization.• The difference between the loss of f∗ and ˆf is called excess risk:xexc = L(ˆf) − L(f∗)which measures the distance from the output of the algorithm to the best solution possible.(1) Show that xexc = xapp + xest + xopt.Comments: This means that to get better performance, one can think of: 1) building a hypothesis class closer to the ground truth; 2) collecting more data; 3) improving the optimization.(2) Typically, when one has enough data, the empirical loss concentrates around the expected loss: there exists xcon > 0, such that for any f ∈ H, |Lˆ(f) − L(f)| ≤ xcon. Show thatin this case, xest ≤ 2 xcon.Comments: This means that to get small estimation error, the number of data points should be large enough so that concentration happens. The number of data points needed to get concentration xcon is called sample complexity, which is an important topic in learning theory and statistics.Question

Asked Jan 29, 2020

111 views

(Math)

Let D be the distribution over the data points (x, y), and let H be the

hypothesis class, in which one would like to find a function f that has a small expected loss L(f) by minimizing the empirical loss Lˆ(f). A few definitions/terminologies:

• The best function among all (measurable) functions is called Bayes hypothesis:

f^{∗} = arg inf_{f}L(f).

• The best function in the hypothesis class is denoted as

f_{opt} = arg inf_{f∈H}L(f)

• The function that minimizes the empirical loss in the hypothesis class is denoted as

ˆf_{opt} = arg inf_{f∈H}Lˆ(f)

• The function output by the algorithm is denoted as ˆf. (It can be different from ˆf_{opt }since the optimization may not find the best solution.)

• The difference between the loss of f^{∗} and f_{opt} is called approximation error:

x_{app} = L(f_{opt}) − L(f^{∗})

which measures the error introduced in building the model/hypothesis class.

• The difference between the loss of f_{opt} and ˆfopt is called estimation error:

x_{est} = L(ˆf_{opt}) − L(f_{opt})

which measures the error introduced by using finite data to approximate the distribution D.

• The difference between the loss of ˆfopt and ˆf is called optimization error:

x_{opt} = L(ˆf) − L(ˆf_{opt})

which measures the error introduced in optimization.

• The difference between the loss of f^{∗} and ˆf is called excess risk:

x_{exc} = L(ˆf) − L(f^{∗})

which measures the distance from the output of the algorithm to the best solution possible.

(1) Show that x_{exc} = x_{app} + x_{est} + x_{opt.}

**Comments:** This means that to get better performance, one can think of: 1) building a hypothesis class closer to the ground truth; 2) collecting more data; 3) improving the optimization.

(2) Typically, when one has enough data, the empirical loss concentrates around the expected loss: there exists x_{con} > 0, such that for any f ∈ H, |Lˆ(f) − L(f)| ≤ x_{con}. Show that

in this case, x_{est} ≤ 2 x_{con}.**Comments:** This means that to get small estimation error, the number of data points should be large enough so that concentration happens. The number of data points needed to get concentration x_{con }is called sample complexity, which is an important topic in learning theory and statistics.

Step 1

Hello! As you have posted 2 different questions, we are answering the first question. In case you require the unanswered question also, kindly re-post them as separate question.

Step 2

(1)

From the given information,

f*=arg inf_{f}L(f)

f_{opt} = arg inf_{f}_{∈}_{H}L(f)

ˆf_{opt} = arg inf_{f}_{∈}_{H}Lˆ(f)

x_{app} = L(f_{opt}) − L(f^{*})

x_{est} = L(ˆf_{opt}) − L(f_{opt})

x_{opt} = L(ˆf) − L(ˆf_{opt})

x_{exc} = L(ˆf) − L(f^{*})

Step 3

Consider...

Tagged in

Find answers to questions asked by student like you

Show more Q&A

Q: Understanding the Concepts and Skills In Exercises, we identify the y-intercepts and slopes, respect...

A: 1.The line slopes upward since the slope=2 is positive.

Q: The heights of women have a symmetric distribution with a mean of 66 inches and a standard deviation...

A: Given dataMean = 66 inchesStandard deviation =2.5 inchesApplying empirical formula68% of data falls ...

Q: Testing Claims About Variation. In Exercises 5–16, test the given claim. Identify the null hypothesi...

A: Chi square:The test statistic formula for the chi square distribution is,

Q: In this problem, assume that the distribution of differences is approximately normal. Note: For degr...

A: a)The level of significance is given as α = 0.01(=1%).Hypotheses and level of significance:Denote μ1...

Q: In Exercises, the null hypothesis is H0:µ1 = µ2 and the alternative hypothesis is as specified. We h...

A: The test hypotheses are,

Q: The estimated regression equation for a model involving two independent variables and 10 observation...

A: Interpretation of B1:The coefficient or slope of x1 in the regression model is 0.2795.The interpreta...

Q: What is meant by saying that a variable has a chi-square distribution?

A: Chi-square distribution: If the distribution of the variable has a special type of right skewed curv...

Q: Small Sample Weights of golden retriever dogs are normally distributed. Samples of weights of golden...

A: GivenThe weights of golden retriever dogs are normally distributed. Samples of weights of golden ret...

Q: List the three-digit numbers that use each of the digits 2, 5, and 8 once and only once.

A: Here, it is required to find the three-digits that use each of the digits 2,5 and 8 once and only on...