Recent Work On Follow The Perturbed Leader ( Ftpl ) Algorithm For Adversarial Multi Armed Bandit Problem

Decent Essays
Recent work on follow the perturbed leader (FTPL) algorithms for adversarial multi-armed bandit problem has highlighted the role of the hazard rate of the distribution generating the perturbations.
Assuming that the hazard rate is bounded allows one to provide regret analyses for a variety of
FTPL algorithms for the multi-armed bandit problem. This paper pushes the inquiry into regret bounds for FTPL algorithms beyond the bounded hazard rate condition. There are good reasons to do so: natural distributions such as the uniform and Gaussian violate the condition. We give regret bounds for both bounded support and unbounded support distributions without assuming the hazard rate condition. We also disprove a conjecture that the Gaussian
…show more content…
The former corresponds to infimal convolution smoothing and the latter corresponds to stochastic (or integral convolution) smoothing (Abernethy et al., 2014). Having a generic framework for understanding perturbations allows one to study a wide variety of online linear optimization games and a number of interesting perturbations.
There has also been some work on understanding perturbation approaches in bandit problems.
Kujala and Elomaa (2005) and Poland (2005) both showed that using the exponential (actually double exponential/Laplace) distribution in an FTPL algorithm coupled with standard unbiased estimation techniques yields near-optimal O(NT logN) regret in T rounds with N arms. Unbiased estimation needs access to arm probabilities that are not explicitly available when using an FTPL algorithm. Neu and Bart´ok (2013) introduced the geometric resampling scheme to approximate these probabilities while still guaranteeing low regret. Recently, Abernethy et al. (2015) analyzed FTPL for adversarial multi-armed bandits and provided regret bounds under the condition that the hazard rate of the perturbation distribution is bounded. This condition allowed them to consider a variety of perturbation distributions beyond the exponential, such as gamma, Gumbel, Frechet, Pareto, and Weibull.
Unfortunately, the bounded hazard rate condition is violated by two of the most widely known distributions: namely the uniform1 and the Gaussian distributions. As a result, the
    Get Access