= .7, then Notice that Roi (h) can be interpreted as the misclassification rate. That is, if Ro1(h) : predicting h would result in the wrong answer for 70% of the data points. Given the data set {4, 2, 4, 1, 3, 4, 4, 3, 2, 5}, plot the empirical risk Ro1 (h) for h = [0, 5]. Hint: the function should have point discontinuities. c) Is gradient descent useful for minimizing the risk with zero-one loss? Why or why not? Make reference to your plot of the risk in your answer. Hint: the risk is indeed non-convex, but gradient descent can still be useful for minimizing non-convex functions. Is there some other reason?

Computer Networking: A Top-Down Approach (7th Edition)
7th Edition
ISBN:9780133594140
Author:James Kurose, Keith Ross
Publisher:James Kurose, Keith Ross
Chapter1: Computer Networks And The Internet
Section: Chapter Questions
Problem R1RQ: What is the difference between a host and an end system? List several different types of end...
icon
Related questions
Question
100%

Keep the answer medium in length, not too long and not too short.

Notice that Ro1(h) can be interpreted as the misclassification rate. That is, if R01(h)
.7, then
predicting h would result in the wrong answer for 70% of the data points. Given the data set
{4, 2, 4, 1, 3, 4, 4, 3, 2, 5}, plot the empirical risk Ro1(h) for h = [0, 5].
Hint: the function should have point discontinuities.
=
c) Is gradient descent useful for minimizing the risk with zero-one loss? Why or why not? Make
reference to your plot of the risk in your answer.
Hint: the risk is indeed non-convex, but gradient descent can still be useful for minimizing non-convex
functions. Is there some other reason?
Transcribed Image Text:Notice that Ro1(h) can be interpreted as the misclassification rate. That is, if R01(h) .7, then predicting h would result in the wrong answer for 70% of the data points. Given the data set {4, 2, 4, 1, 3, 4, 4, 3, 2, 5}, plot the empirical risk Ro1(h) for h = [0, 5]. Hint: the function should have point discontinuities. = c) Is gradient descent useful for minimizing the risk with zero-one loss? Why or why not? Make reference to your plot of the risk in your answer. Hint: the risk is indeed non-convex, but gradient descent can still be useful for minimizing non-convex functions. Is there some other reason?
Expert Solution
trending now

Trending now

This is a popular solution!

steps

Step by step

Solved in 3 steps

Blurred answer
Recommended textbooks for you
Computer Networking: A Top-Down Approach (7th Edi…
Computer Networking: A Top-Down Approach (7th Edi…
Computer Engineering
ISBN:
9780133594140
Author:
James Kurose, Keith Ross
Publisher:
PEARSON
Computer Organization and Design MIPS Edition, Fi…
Computer Organization and Design MIPS Edition, Fi…
Computer Engineering
ISBN:
9780124077263
Author:
David A. Patterson, John L. Hennessy
Publisher:
Elsevier Science
Network+ Guide to Networks (MindTap Course List)
Network+ Guide to Networks (MindTap Course List)
Computer Engineering
ISBN:
9781337569330
Author:
Jill West, Tamara Dean, Jean Andrews
Publisher:
Cengage Learning
Concepts of Database Management
Concepts of Database Management
Computer Engineering
ISBN:
9781337093422
Author:
Joy L. Starks, Philip J. Pratt, Mary Z. Last
Publisher:
Cengage Learning
Prelude to Programming
Prelude to Programming
Computer Engineering
ISBN:
9780133750423
Author:
VENIT, Stewart
Publisher:
Pearson Education
Sc Business Data Communications and Networking, T…
Sc Business Data Communications and Networking, T…
Computer Engineering
ISBN:
9781119368830
Author:
FITZGERALD
Publisher:
WILEY