Let's consider the following 3-state MDP(Markov Decision Process) for a robot trying to walk, the three states being 'Fallen', 'Standing' and 'Moving', as shown in the following figure. 1, +1 1, +1 Standing 0.6, +2 0.4, +1 0.4, -1 Moving Fallen 0.2, -1 0.8, +2 0.6, -1 slow action (black) fast action (green) Use the MDP formulation to code the following problem and find the optimal Values using the value iteration algorithm. And then use policy iteration method to find optimal policy for discount factor y=0.1. Try using this method with a different discount factor, for example a much larger discount factor like 0.9 or 0.99, or a much smaller one like 0.01. Does the optimal policy change comment on it?

Let's consider the following 3-state MDP(Markov Decision Process) for a robot trying to walk, the three states being 'Fallen', 'Standing' and 'Moving', as shown in the following figure. 1, +1 1, +1 Standing 0.6, +2 0.4, +1 0.4, -1 Moving Fallen 0.2, -1 0.8, +2 0.6, -1 slow action (black) fast action (green) Use the MDP formulation to code the following problem and find the optimal Values using the value iteration algorithm. And then use policy iteration method to find optimal policy for discount factor y=0.1. Try using this method with a different discount factor, for example a much larger discount factor like 0.9 or 0.99, or a much smaller one like 0.01. Does the optimal policy change comment on it?

Operations Research : Applications and Algorithms

4th Edition

ISBN:9780534380588

Author:Wayne L. Winston

Publisher:Wayne L. Winston

Chapter17: Markov Chains

Section: Chapter Questions

Problem 15RP

See similar textbooks

Similar questions

Consider a fully-connected artificial neural network with one hidden layer, i.e., a multilayer perceptron (MLP), which has 5 inputs, 3 neurons in the hidden layer, and 1 output neuron. The relation between the output y and the inputs x = [x1, . . . , x5] is given by y(x) = f (w, φ(x)), where φ(x) = [φ1(x), φ2(x), φ3(x)] 1. Draw the diagram that shows the inputs, nuerons, connections, correspond-ing weight parameters, and activation functions. 2. Explain the relation y(x) = f (w, φ(x)): write the explicit relation, explainthe role of functions f and φ(x), and state examples of functions.
Given a Markov reward process: If the values of the states are initialized to 0, and the probabilities are 0.5 for the transitions, hand-simulate 2-step TD(0) for an episode that has trace C - D - C - D - E - T
Consider the illustration: Suppose that a group of robots is traversing this maze. At each step, each robot will choose a path and move along it, where it is equally likely to select each available path and cannot choose to stay where it is. (At the end of each step, each robot will be in one of the four numbered rooms.) Part (a): Construct the appropriate transition matrix for the Markov chain modeling this scenario. Part (b): Find the steady state probability vector.
How many states have a nonsimplified Markov chain for a system consistingof n components? Assume that each component has two states: operationaland failed.
Discuss the following the following graph like structures below and provide a case each for which they have been applied in AI models. a. K-means b. K-nearest neighbors c. Decision tress d. Random forests e. Markov chains f. Neural networks. Discuss the following possible classification of outcomes in an AI experiment and provide 2 scenarios each in which they are applied to the results of hospital diagnostics based on AI based system. a. False Positive b. True positive. c. False negative. d. True negative. Discuss the 5 specific examples in which graph theory has been applied in artificial intelligence.
giving the following weight matrix for a vector quantization neural network: w = [ 0.6000 0.6300 0.8000 0.5000 1.0000 -0.7000] and given the following input x= [0.3 , 0.5] decide the winning node and hence update the weight matrix given a training rate of 0.1
Markoviian Decision Processes Identify a real-world problem not used as an example in class => cannot use Pacman, racing cars, FPS, Blackjack…. and then describe this problem as a MDP noting all required components (states, actions, policy, etc.) Give an example
You are required to create a Julia program that does the following in this problem:Analyze every policy you are given, then tweak it until a solution is discovered. Real-time recording and saving of the Markov decision process (MDP).
Two layers neural networks consist of 3 input neurons and one output neuron. The input vector is (x1, x2, x3) where x3 is always =1.0. x1, x2 and the desired output d are given by the table below. When using the initial weight vector w = (1.0, 1.0, -3.0) there was an error as shown in the figure. The equation for the decision boundary was: x1*1 + x2*1 - 1*3=0. , (i.e. net = x1*w1+x2*w2+x3*w3), f(net) is the sign of net What are the new values of the vector w after the first input (1,1,1), note that input x3 is always 1.0? What are the new values of the vector w after the second input (9.4, 6.4, 1.0)? Suggest a vector w such that the total error is ZERO for the 10 points.
Partially Observable Markoviian Decision Processes Identify a real-world problem not used either in problem #1 above or as an example in class and then describe this problem as a POMDP noting all required components (states, actions, policy, etc.) Give an example.
In this question, we will ask whether an information cascade can occur if each individual sees only the action of his immediate neighbor rather than the actions of all those who have chosen previously. Let’s keep the same setup as in the Information Cascades model discussed in the class, except than when individual i chooses he observes only his own signal and the action of individual i - 1. a. Briefly explain why the decision problems faced by individuals 1 and 2 are unchanged by this modification to the information network. (b) Individual 3 observes the action of individual 2, but not the action of individual 1. What can 3 infer about 2’s signal from 2’s action? (c) Can 3 infer anything about 1’s signal from 2’s action? Explain. (d) What should 3 do if he observes a high signal and he knows that 2 Accepted? What if 3’s signal was low and 2 Accepted? (e) Do you think that a cascade can form in this world? Explain why or why not. A formal proof is not necessary, a brief argument is…
(i). Consider a 3-layer perceptron neural network consisting of 1 input neuron, 2hidden neurons with sigmoid activation functions, and 1 output neuron with linearactivation function. Assuming v1 and v2 represent the connection weights fromthe hidden neurons to the output neuron, v0 represents the bias for the outputneuron, w11 and w21 represent the connection weights from the input neuron tothe hidden neurons, and w10 and w20 represent the biases for the 2 hiddenneurons. Let x represent the input to the neural network and y represent the outputfrom the neural network.Suppose the vector of neural network weights is:w = [v0 v1 v2 w10 w11 w20 w21]T = [0 0.1 0.5 0 -1 1 0.2]T. Calculate the output y if input x = 0. Calculate y again for x = 1. (ii). For the same neural network as in (1), calculate the derivative of y with respect the neural network weight parameters v0, v1, v2 , w10 , w11 , w20, and w21 for x = 0. Calculate the derivatives again for x = 1.