18461_661_HW7
.pdf
keyboard_arrow_up
School
Carnegie Mellon University *
*We aren’t endorsed by this school
Course
18661
Subject
Computer Science
Date
Dec 6, 2023
Type
Pages
3
Uploaded by MajorHare3795
Homework 7
ECE 461/661: Introduction to Machine Learning
Prof. Yuejie Chi and Prof. Beidi Chen
Due: Sunday Dec. 3rd, 2023 at 8:59 PM PT / 11:59PM ET
Please remember to show your work for all problems and to write down the names of any students
that you collaborate with. The full collaboration and grading policies are available on the course website:
https://18661.github.io/
Your solutions should be uploaded to Gradescope (
https://www.gradescope.com/
) in PDF format by
the deadline. We will not accept hardcopies. If you choose to hand-write your solutions, please make sure
the uploaded copies are legible. Gradescope will ask you to identify which page(s) contain your solutions to
which problems, so make sure you leave enough time to finish this before the deadline. We will give you a
30-minute grace period to upload your solutions in case of technical problems.
1
Gaussian Multi-Armed Bandit
[33 points]
Consider the following multi-armed bandit problem.
We have three very unique slot machines,
k
1
, k
2
, k
3
,
which provide rewards drawn from univariate Gaussian distributions. Each distribution has an unknown (to
us) mean,
µ
1
, µ
2
, µ
3
, and unknown variance.
Every time we pull a lever, a reward is observed.
We have
observed the rewards listed below for each slot machine. Now it is up to us to understand the exploration-
exploitation tradeoff and determine which slot machine provides the highest reward on average in the fewest
pulls. Assume the highest expected reward over all arms is 1 (i.e.,
max
{
µ
1
, µ
2
, µ
3
}
= 1
).
1
k
1
rewards
-3.4246
4.5886
-0.4250
-0.8251
1.4727
0.1228
3.5182
k
2
rewards
-4.8422
-9.2154
-3.8178
3.7586
-3.9574
k
3
rewards
0.2795
1.4759
(a) (4pts) Suppose on the next pull we choose the slot machine according to a greedy approach. Which
slot machine should we pick for the next pull if we choose to exploit? Show your work and explain
why.
(b) (5pts) Now we would like to understand the difference between our observed rewards over the 14 pulls
and the rewards collected from an optimal strategy. Provide a numerical answer.
(c) (9pts) Suppose we chose to explore either of the other two slot machines instead of exploiting at the
next pull. We receive a reward of 5. How does this change our choice for the next pull (still assuming
we are greedy) and our regret? Explain why the regret increases, decreases, or stays the same.
(d) (6pts)
UCB1 Approach.
Suppose instead of the greedy approach we take a different approach before
our next draw. Instead, we used the UCB1 algorithm. Calculate the upper confidence bounds for each
slot machine (use the natural logarithm). What is our optimal choice using this algorithm?
(e) (9pts) Suppose we choose the optimal choice we found in the previous question. We receive a reward
of -5. How does this change our optimal choice for the next pull? What happens to our regret after
this observation and why?
2
Gridworld
[34 points]
Consider the following grid environment. Starting from any unshaded square, you can move up, down,
left, or right. Actions are deterministic and always succeed (e.g. going left from state 16 goes to state
15) unless they will cause the agent to run into a wall. The thicker edges indicate walls, and attempting
to move in the direction of a wall results in staying in the same square (e.g. going in any direction
other than left from state 16 stays in 16). Taking any action from the target square with cheese (no.
11) earns a reward of
r
g
(so
r
(11
, a
)
=
r
g
∀
a
) and ends the episode. Taking any action from the square
of cat (no. 6) earns a reward of
r
r
(so
r
(6
, a
)
=
r
r
∀
a
) and ends the episode. Otherwise, from every
other square, taking any action is associated with a reward
r
s
∈ {−
1
,
0
,
+1
}
(even if the action results
in the agent staying in the same square). Assume the discount factor
γ
= 1
,
r
g
= +10
, and
r
r
=
−
1000
unless otherwise specified.
1
2
3
4
5
7
8
9
10
12
13
14
15
16
(a) (12pts) Let
r
s
= 0
.
Evaluate the following policy and show the corresponding value for every
state (square).
2
Your preview ends here
Eager to read complete document? Join bartleby learn and gain access to the full version
- Access to all documents
- Unlimited textbook solutions
- 24/7 expert homework help
Related Questions
Help.
arrow_forward
ER diagram for the following use cases:
Intro/Homepage: This first case will be used to give the user the first look at what the whole project is about. They see the beginning stage and interact with all the links to the movies, characters, and descriptions.
Browse MCU Shows: This case will be for the customers and users to be able to navigate through the feature and see what they are specifically. They can use it to find which exact Marvel Cinematic Universe movie/show they wish to watch and see what character they like.
View MCU Movie Details: This case will be for users to get more information about specific Marvel movies/shows. It will give more details like the cast, the run time, the review score, the characters involved and etc.
arrow_forward
Experimental Engineering Analysis:
Experimental engineering analysis is used
for Solution of engineering problems
true
false
There are in general levels of standards of
measurements International standards and
Working standards
true
False
The value of the Mega is equal to 10^-6 *
true
false
arrow_forward
user stories
I need 5 different user stories about Online Crime Management for my project
For example, user stories might look like that:
I don't wanna just a police officer it is just an exampleAs a police officer, i would like to be able to check for past records with an ID number(like a passport), so that I don’t have to use the name of the person
second example
“As a [persona], I [want to], [so that].”Breaking this down:"As a [persona]": Who are we building this for? We’re not just after a job title, we’re after the persona of the person. Max. Our team should have a shared understanding of who Max is. We’ve hopefully interviewed plenty of Max’s. We understand how that person works, how they think and what they feel. We have empathy for Max.“Wants to”: Here we’re describing their intent — not the features they use. What is it they’re actually trying to achieve? This statement should be implementation free — if you’re describing any part of the UI and not the user goal, you're…
arrow_forward
Guest House Paradiso
The Guest House Paradiso used to be a splendid place to stay. But things haven't been updated in a while. Including the way that room bookings are recorded. At the moment
they're recorded in a spreadsheet like this:
Bookings Table
Room
Number
Number of
Nights
Arrival Date
Departure Date
Guest Last Name
Guest Phone Number
Optional Extras
203
Mar 18 2022
Mar 20 2022
Howard
555-2381
bug-free bed
101
Mar 18 2022
Mar 21 2022
3
Gallagher
555-3619
clean towels, hot water
302
Mar 18 2022
Mar 19 2022
1
Mortimer
555-0856
bug-free bed, clean towels
103
Mar 18 2022
Mar 23 2022
5
McQueen
557-2428
hot water
...
203
Mar 20 2022
Mar 23 2022
3
Carraclough
554-8652
clean towels, hot water
Optional Extras
There are three types of 'Optional Extras' available in all rooms, and guests can order any combination of these:
• hot water
• bug-free bed
• clean towels
NOTE: For this scenario we will assume that a guest's phone number never changes and is unique.
arrow_forward
Detailed answer
arrow_forward
WHAT IS SCIENTIFIC RESEARCH METHOD ? PLEASE WRITE DOWN THE STEPS OF SCIENTIFIC RESEARCH METHOD IN DETAILS ? GIVE EXAMLES IF POSSIBLE ?
arrow_forward
Create a Vaccination System using PHP that has the following features:1. Add client record(such as firstname, lastname, middlename, category, address, date of first dose, date of second dose)
Please provide code and output if its working
Please put some design on it don't answer some answer that you find on chegg don't repeat the same wrong answer you give it doesn't even run you don't put the output of that
arrow_forward
user stories
I need 5 user stories about Online Crime Management for my project
For example, user stories might look like that:As a police officer, i would like to be able to check for past records with an ID number(like a passport), so that I don’t have to use the name of the person
second example
“As a [persona], I [want to], [so that].”Breaking this down:"As a [persona]": Who are we building this for? We’re not just after a job title, we’re after the persona of the person. Max. Our team should have a shared understanding of who Max is. We’ve hopefully interviewed plenty of Max’s. We understand how that person works, how they think and what they feel. We have empathy for Max.“Wants to”: Here we’re describing their intent — not the features they use. What is it they’re actually trying to achieve? This statement should be implementation free — if you’re describing any part of the UI and not the user goal, you're missing the point.“So that”: how does their immediate desire to do…
arrow_forward
In the ravat mnemonic, it refers to how current your sources, sucg as journal articles, are.
arrow_forward
Pipelining decreases the latency of each task.
true
false
arrow_forward
Student Companion ™ is your personal academic guide that provides the following basic information for each course while a student at the USC. Courses done, grades earned, Course outline attachments, course notes etc.This is a simple application to keep track of each course you are doing/have done and can serve as a personal test timetable for the course which has final quiz, keep track of your courses that have etc. The concept is like a personal check sheet system blended with a simple quiz timetable.
This also solves a problem that often shows up when students have completed their program and need to revert back to data pertaining to their academic life from their viewpoint. Please note this does not replace a transcript, which is an official University Document.
1. Create a Class Diagram of at least six (6) major entities for your application. For entities with a large number of attributes you are allowed to limit your attributes to a max of six (6). All operations within each class…
arrow_forward
6
DO NOT COPY FROM OTHER WEBSITES
Correct and detailed answer will be Upvoted else downvoted. Thank you!
arrow_forward
There are two strategies for ending procedures and resolving impasses:
arrow_forward
To best use testing as a study method, you should aim for at least ____ correct retrievals of the information per study session.
a. 3
b. 2
c. 0
d. 5
arrow_forward
In excel, what is the formula to get the email address of Cardo Dalisay
arrow_forward
Create an end of workshop evaluation form.
arrow_forward
Before you utilise a file extension, find out what it is and why it's useful for your project.
arrow_forward
Business Rules:
Each musician at NewTalent should have a name, address, and phone number on the system. The Instruments used in the songs have a unique name, musical genre (i.e. rock, pop, hip hop), and type (i.e. string, piano, guitar). Every album recorded under the NewTalent label has a title, copyright date, format, and an album identifier. Each song recorded has a title and an author. Musicians can play several instruments, and an instrument can be played by multiple musicians. Every album contains several songs, but a song can only appear on one album. Songs are performed by one or more musicians, and a musician can perform numerous songs. Each album has one musician acting as its producer, although a musician can produce multiple albums.
Create an ERD for the above scenario that describes its business rules.
Use Crow's Foot notations and make sure to include all of the following:
All entities correctly identified with Primary key and any foreign key attributes.…
arrow_forward
Alert: Don't submit AI generated answer and propvide detail solution with proper explanation and step by step answer.
prove by cases that : | x+ y| <= |x| + |y|
arrow_forward
Written Use Case
Follow the sample format shown in the attached photo.Sample have its label, while the one you should make a written use case is the one who have blue ovals
arrow_forward
just do what you can from the project
arrow_forward
If you wish to live in an earthquake-prone location, use the earthquake hazard map to figure out where you want to live.
arrow_forward
SEE MORE QUESTIONS
Recommended textbooks for you
Database System Concepts
Computer Science
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:McGraw-Hill Education
Starting Out with Python (4th Edition)
Computer Science
ISBN:9780134444321
Author:Tony Gaddis
Publisher:PEARSON
Digital Fundamentals (11th Edition)
Computer Science
ISBN:9780132737968
Author:Thomas L. Floyd
Publisher:PEARSON
C How to Program (8th Edition)
Computer Science
ISBN:9780133976892
Author:Paul J. Deitel, Harvey Deitel
Publisher:PEARSON
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781337627900
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning
Programmable Logic Controllers
Computer Science
ISBN:9780073373843
Author:Frank D. Petruzella
Publisher:McGraw-Hill Education
Related Questions
- Help.arrow_forwardER diagram for the following use cases: Intro/Homepage: This first case will be used to give the user the first look at what the whole project is about. They see the beginning stage and interact with all the links to the movies, characters, and descriptions. Browse MCU Shows: This case will be for the customers and users to be able to navigate through the feature and see what they are specifically. They can use it to find which exact Marvel Cinematic Universe movie/show they wish to watch and see what character they like. View MCU Movie Details: This case will be for users to get more information about specific Marvel movies/shows. It will give more details like the cast, the run time, the review score, the characters involved and etc.arrow_forwardExperimental Engineering Analysis: Experimental engineering analysis is used for Solution of engineering problems true false There are in general levels of standards of measurements International standards and Working standards true False The value of the Mega is equal to 10^-6 * true falsearrow_forward
- user stories I need 5 different user stories about Online Crime Management for my project For example, user stories might look like that: I don't wanna just a police officer it is just an exampleAs a police officer, i would like to be able to check for past records with an ID number(like a passport), so that I don’t have to use the name of the person second example “As a [persona], I [want to], [so that].”Breaking this down:"As a [persona]": Who are we building this for? We’re not just after a job title, we’re after the persona of the person. Max. Our team should have a shared understanding of who Max is. We’ve hopefully interviewed plenty of Max’s. We understand how that person works, how they think and what they feel. We have empathy for Max.“Wants to”: Here we’re describing their intent — not the features they use. What is it they’re actually trying to achieve? This statement should be implementation free — if you’re describing any part of the UI and not the user goal, you're…arrow_forwardGuest House Paradiso The Guest House Paradiso used to be a splendid place to stay. But things haven't been updated in a while. Including the way that room bookings are recorded. At the moment they're recorded in a spreadsheet like this: Bookings Table Room Number Number of Nights Arrival Date Departure Date Guest Last Name Guest Phone Number Optional Extras 203 Mar 18 2022 Mar 20 2022 Howard 555-2381 bug-free bed 101 Mar 18 2022 Mar 21 2022 3 Gallagher 555-3619 clean towels, hot water 302 Mar 18 2022 Mar 19 2022 1 Mortimer 555-0856 bug-free bed, clean towels 103 Mar 18 2022 Mar 23 2022 5 McQueen 557-2428 hot water ... 203 Mar 20 2022 Mar 23 2022 3 Carraclough 554-8652 clean towels, hot water Optional Extras There are three types of 'Optional Extras' available in all rooms, and guests can order any combination of these: • hot water • bug-free bed • clean towels NOTE: For this scenario we will assume that a guest's phone number never changes and is unique.arrow_forwardDetailed answerarrow_forward
- WHAT IS SCIENTIFIC RESEARCH METHOD ? PLEASE WRITE DOWN THE STEPS OF SCIENTIFIC RESEARCH METHOD IN DETAILS ? GIVE EXAMLES IF POSSIBLE ?arrow_forwardCreate a Vaccination System using PHP that has the following features:1. Add client record(such as firstname, lastname, middlename, category, address, date of first dose, date of second dose) Please provide code and output if its working Please put some design on it don't answer some answer that you find on chegg don't repeat the same wrong answer you give it doesn't even run you don't put the output of thatarrow_forwarduser stories I need 5 user stories about Online Crime Management for my project For example, user stories might look like that:As a police officer, i would like to be able to check for past records with an ID number(like a passport), so that I don’t have to use the name of the person second example “As a [persona], I [want to], [so that].”Breaking this down:"As a [persona]": Who are we building this for? We’re not just after a job title, we’re after the persona of the person. Max. Our team should have a shared understanding of who Max is. We’ve hopefully interviewed plenty of Max’s. We understand how that person works, how they think and what they feel. We have empathy for Max.“Wants to”: Here we’re describing their intent — not the features they use. What is it they’re actually trying to achieve? This statement should be implementation free — if you’re describing any part of the UI and not the user goal, you're missing the point.“So that”: how does their immediate desire to do…arrow_forward
- In the ravat mnemonic, it refers to how current your sources, sucg as journal articles, are.arrow_forwardPipelining decreases the latency of each task. true falsearrow_forwardStudent Companion ™ is your personal academic guide that provides the following basic information for each course while a student at the USC. Courses done, grades earned, Course outline attachments, course notes etc.This is a simple application to keep track of each course you are doing/have done and can serve as a personal test timetable for the course which has final quiz, keep track of your courses that have etc. The concept is like a personal check sheet system blended with a simple quiz timetable. This also solves a problem that often shows up when students have completed their program and need to revert back to data pertaining to their academic life from their viewpoint. Please note this does not replace a transcript, which is an official University Document. 1. Create a Class Diagram of at least six (6) major entities for your application. For entities with a large number of attributes you are allowed to limit your attributes to a max of six (6). All operations within each class…arrow_forward
arrow_back_ios
SEE MORE QUESTIONS
arrow_forward_ios
Recommended textbooks for you
- Database System ConceptsComputer ScienceISBN:9780078022159Author:Abraham Silberschatz Professor, Henry F. Korth, S. SudarshanPublisher:McGraw-Hill EducationStarting Out with Python (4th Edition)Computer ScienceISBN:9780134444321Author:Tony GaddisPublisher:PEARSONDigital Fundamentals (11th Edition)Computer ScienceISBN:9780132737968Author:Thomas L. FloydPublisher:PEARSON
- C How to Program (8th Edition)Computer ScienceISBN:9780133976892Author:Paul J. Deitel, Harvey DeitelPublisher:PEARSONDatabase Systems: Design, Implementation, & Manag...Computer ScienceISBN:9781337627900Author:Carlos Coronel, Steven MorrisPublisher:Cengage LearningProgrammable Logic ControllersComputer ScienceISBN:9780073373843Author:Frank D. PetruzellaPublisher:McGraw-Hill Education
Database System Concepts
Computer Science
ISBN:9780078022159
Author:Abraham Silberschatz Professor, Henry F. Korth, S. Sudarshan
Publisher:McGraw-Hill Education
Starting Out with Python (4th Edition)
Computer Science
ISBN:9780134444321
Author:Tony Gaddis
Publisher:PEARSON
Digital Fundamentals (11th Edition)
Computer Science
ISBN:9780132737968
Author:Thomas L. Floyd
Publisher:PEARSON
C How to Program (8th Edition)
Computer Science
ISBN:9780133976892
Author:Paul J. Deitel, Harvey Deitel
Publisher:PEARSON
Database Systems: Design, Implementation, & Manag...
Computer Science
ISBN:9781337627900
Author:Carlos Coronel, Steven Morris
Publisher:Cengage Learning
Programmable Logic Controllers
Computer Science
ISBN:9780073373843
Author:Frank D. Petruzella
Publisher:McGraw-Hill Education