Your MDP has four states: Read emails (E), Get money (M), Be fooled (F) or Have fun (H). The actions are denoted by fat arrows, the (probabilistic) transitions are indicated by thin arrows, annotated by the transition probabilities. The rewards only depend on the state, for example, the reward in state E is 0, in state M it is 10,000.

Operations Research : Applications and Algorithms
4th Edition
ISBN:9780534380588
Author:Wayne L. Winston
Publisher:Wayne L. Winston
Chapter17: Markov Chains
Section17.4: Classification Of States In A Markov Chain
Problem 5P
icon
Related questions
Question

What are the possible policies in this MDP?

5. Markov Decision Process
Consider the following scenario. You are reading email, and you get an offer from the CEO of Marsomania
Ltd., asking you to consider investing into an expedition, which plans to dig for gold on Mars. You can either
choose to invest, with the prospect of either getting money or fooled, or you can instead choose to ignore
your emails and go to a party. Of course your first thought is to model this as Markov Decision Process, and
you come up with the MDP as follows.
Get money Stay
Invest
Read
Emails (E)
R=0
Go to
party
(M)
R=10000
.2
Be fooled
(F)
Re-100
1 Go back
Stay
Have fun
(H)
R-1
Transcribed Image Text:5. Markov Decision Process Consider the following scenario. You are reading email, and you get an offer from the CEO of Marsomania Ltd., asking you to consider investing into an expedition, which plans to dig for gold on Mars. You can either choose to invest, with the prospect of either getting money or fooled, or you can instead choose to ignore your emails and go to a party. Of course your first thought is to model this as Markov Decision Process, and you come up with the MDP as follows. Get money Stay Invest Read Emails (E) R=0 Go to party (M) R=10000 .2 Be fooled (F) Re-100 1 Go back Stay Have fun (H) R-1
Your MDP has four states: Read emails (E), Get money (M), Be fooled (F) or Have fun (H). The actions are
denoted by fat arrows, the (probabilistic) transitions are indicated by thin arrows, annotated by the transition
probabilities. The rewards only depend on the state, for example, the reward in state E is 0, in state M it is
10,000.
Transcribed Image Text:Your MDP has four states: Read emails (E), Get money (M), Be fooled (F) or Have fun (H). The actions are denoted by fat arrows, the (probabilistic) transitions are indicated by thin arrows, annotated by the transition probabilities. The rewards only depend on the state, for example, the reward in state E is 0, in state M it is 10,000.
Expert Solution
steps

Step by step

Solved in 2 steps

Blurred answer
Knowledge Booster
Inference
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.
Similar questions
  • SEE MORE QUESTIONS
Recommended textbooks for you
Operations Research : Applications and Algorithms
Operations Research : Applications and Algorithms
Computer Science
ISBN:
9780534380588
Author:
Wayne L. Winston
Publisher:
Brooks Cole