Algorithm for Mean-payoff learning for black-box MDP Input: MDP M, imprecision εMP > 0, MP-inconfidence δMP > 0, lower bound pmin on transition probabilities in M Parameters: revisit threshold k ≥ 2, episode length n ≥ 1 Output: upon termination εMP -precise estimate of the maximum mean payoff for M with confidence 1 − δMP , i.e. (εMP , 1 − δMP )-PAC estimate

Operations Research : Applications and Algorithms
4th Edition
ISBN:9780534380588
Author:Wayne L. Winston
Publisher:Wayne L. Winston
Chapter20: Queuing Theory
Section20.8: The M/g/1/gd/∞/∞ Queuing System
Problem 5P
icon
Related questions
Question
100%

Algorithm for Mean-payoff learning for black-box MDP
Input: MDP M, imprecision εMP > 0, MP-inconfidence δMP > 0, lower bound pmin
on transition probabilities in M
Parameters: revisit threshold k ≥ 2, episode length n ≥ 1
Output: upon termination εMP -precise estimate of the maximum mean payoff for M
with confidence 1 − δMP , i.e. (εMP , 1 − δMP )-PAC estimate

Expert Solution
steps

Step by step

Solved in 2 steps

Blurred answer
Knowledge Booster
Temporal Difference Learning
Learn more about
Need a deep-dive on the concept behind this application? Look no further. Learn more about this topic, computer-science and related others by exploring similar questions and additional content below.
Similar questions
  • SEE MORE QUESTIONS
Recommended textbooks for you
Operations Research : Applications and Algorithms
Operations Research : Applications and Algorithms
Computer Science
ISBN:
9780534380588
Author:
Wayne L. Winston
Publisher:
Brooks Cole