Recent Work On Follow The Perturbed Leader ( Ftpl ) Algorithm For Adversarial Multi Armed Bandit Problem

Decent Essays

Recent work on follow the perturbed leader (FTPL) algorithms for adversarial multi-armed bandit problem has highlighted the role of the hazard rate of the distribution generating the perturbations.
Assuming that the hazard rate is bounded allows one to provide regret analyses for a variety of
FTPL algorithms for the multi-armed bandit problem. This paper pushes the inquiry into regret bounds for FTPL algorithms beyond the bounded hazard rate condition. There are good reasons to do so: natural distributions such as the uniform and Gaussian violate the condition. We give regret bounds for both bounded support and unbounded support distributions without assuming the hazard rate condition. We also disprove a conjecture that the Gaussian …show more content…

The former corresponds to infimal convolution smoothing and the latter corresponds to stochastic (or integral convolution) smoothing (Abernethy et al., 2014). Having a generic framework for understanding perturbations allows one to study a wide variety of online linear optimization games and a number of interesting perturbations.
There has also been some work on understanding perturbation approaches in bandit problems.
Kujala and Elomaa (2005) and Poland (2005) both showed that using the exponential (actually double exponential/Laplace) distribution in an FTPL algorithm coupled with standard unbiased estimation techniques yields near-optimal O(NT logN) regret in T rounds with N arms. Unbiased estimation needs access to arm probabilities that are not explicitly available when using an FTPL algorithm. Neu and Bart´ok (2013) introduced the geometric resampling scheme to approximate these probabilities while still guaranteeing low regret. Recently, Abernethy et al. (2015) analyzed FTPL for adversarial multi-armed bandits and provided regret bounds under the condition that the hazard rate of the perturbation distribution is bounded. This condition allowed them to consider a variety of perturbation distributions beyond the exponential, such as gamma, Gumbel, Frechet, Pareto, and Weibull.
Unfortunately, the bounded hazard rate condition is violated by two of the most widely known distributions: namely the uniform1 and the Gaussian distributions. As a result, the

Get Access

Recent Work On Follow The Perturbed Leader ( Ftpl ) Algorithm For Adversarial Multi Armed Bandit Problem

Alphago Research Paper

Alphago Research Paper

Legal Studies- Courts Role in the Criminal Justice System Essay example

Legal Studies- Courts Role in the Criminal Justice System Essay example

Finance

Finance

U.s. Senate Armed Services Committee

U.s. Senate Armed Services Committee

The Danger Administration And Its Effects On The United States

The Danger Administration And Its Effects On The United States

Health and social care

Health and social care

Weapons Of Math Destruction Summary

Weapons Of Math Destruction Summary

Jet Propulsion Laboratory

Jet Propulsion Laboratory

I Am At The Final Phase Of Wrapping Up My Thesis

I Am At The Final Phase Of Wrapping Up My Thesis

Rethinkrobot Lab

Rethinkrobot Lab

Cuckoo Calculation

Cuckoo Calculation

International Bone Marrow Transplan Registry

International Bone Marrow Transplan Registry

The Robots Rebellion

The Robots Rebellion

Case Study Of Bharti Axa Life Insurance Services

Case Study Of Bharti Axa Life Insurance Services

Discussion and Critical Review of the Equity Premium Puzzle

Discussion and Critical Review of the Equity Premium Puzzle

Related Topics