This post is part of the lecture notes of my class “Introduction to Online Learning” at Boston University, Fall 2019. I will publish two lectures per week.
You can find the lectures I published till now here.
In the last lecture, we have shown a very simple and parameter-free algorithm for Online Convex Optimization (OCO) in -dimensions, based on a reduction to a coin-betting problem. Now, we will see how to reduce Learning with Expert Advice (LEA) to betting on coins, obtaining again parameter-free and optimal algorithms.
1. Reduction to Learning with Experts
First, remember that the regret we got from Online Mirror Descent (OMD), and similarly for Follow-The-Regularized-Leader (FTRL), is
where is the prior distribution on the experts and is the KL-divergence. As we reasoned in the OCO case, in order to set the learning rate we should know the value of . If we could set to , we would obtain a regret of . However, given the adversarial nature of the game, this is impossible. So, as we did in the OCO case, we will show that even this problem can be reduced to betting on a coin, obtaining optimal guarantees with a parameter-free algorithm.
First, let’s introduce some notation. Let be the number of experts and be the -dimensional probability simplex. Let be any prior distribution. Let be a coin-betting algorithm. We will instantiate copies of .
The construction above defines a LEA algorithm defined by the predictions , based on the algorithm . We can prove the following regret bound for it.
Theorem 1 (Regret Bound for Experts) Let be a coin-betting algorithm that guarantees a wealth after rounds with initial money equal to 1 of for any sequence of continuous coin outcomes . Then, the regret of the LEA algorithm with prior that predicts at each round with in (2) satisfies
for any concave and non-decreasing such that .
Proof: We first prove that . Indeed,
The first equality follows from definition of . To see the second equality, consider two cases: If for all then and therefore both and are trivially zero. If then for all .
So, inequality and (4) imply
Now, for any competitor ,
Now, we could think to use the Krichevsky–Trofimov (KT) bettor with this theorem. However, we would obtain a sub-optimal regret guarantee. In fact, remembering the lower bound on the wealth of KT and setting where is a universal constant, we have
We might think that the is the price we have to pay to adapt to the unknown competitor . However, it turns out it can be removed. In the next section, we see how to change the KT strategy to obtain the optimal guarantee.
2. A Betting Strategy that Looses at Most a Constant Fraction of Money
In the reduction before, if we use the KT betting strategy we would have a term under the square root. It turns out that we can avoid that term if we know the number of rounds beforehand. Then, in case is unknown we can just use a doubling trick, paying only a constant multiplicative factor in the regret.
The logarithmic term in the regret comes from the fact that the lower bound on the wealth is
Note that in the case in which the number of heads in the sequence is equal to the number of heads, so that , the guaranteed wealth becomes proportional to . So, for that goes to infinity the bettor will lose all of its money.
Instead, we need a more conservative strategy that guarantees
for small enough and independent of . In this case, the betting strategy has to pace its betting, possibly with the knowledge of the duration of the game, so that even in the case that the number of heads is equal to the number of tails it will only lose a fraction of its money. At the same time, it will still gain an exponential amount of money when the coin outcomes are biased towards one side.
We will prove that this is possible, designing a new betting strategy.
then, by induction, . In fact, we have
Hence, we have to prove that (8) is true in order to guarantee a minimum wealth of our betting strategy.
First, given that is a concave function of , we have
Also, our choice of makes the two quantities above equal with , that is
For other choices of , the two alternatives would be different and the minimum one could always be the one picked by the adversary. Instead, making the two choices worst outcomes equivalent, we minimize the damage of the adversarial choice of the outcomes of the coin. So, we have that
where in the second equality we used the definition of and in the second inequality we used the fact that .
Hence, given that (8) is true, this strategy guarantees
We can now use this betting strategy in the expert reduction in Theorem 1, setting , to have
Note that this betting strategy could also be used in the OCO reduction. Given that we removed the logarithmic term in the exponent, in the 1-dimensional case, we would obtain a regret of
where we gained in the term inside the logarithmic, instead of the term of the KT algorithm. This implies that now we can set to and obtain an asymptotic rate of rather than .
3. History Bits
The first parameter-free algorithm for experts is from (Chaudhuri, K. and Freund, Y. and Hsu, D. J., 2009), named NormalHedge, where they obtained a bound similar to the one in (9) but with an additional term. Then, (Chernov, A. and Vovk, V., 2010) removed the log factors with an update without a closed form. (Orabona, F. and Pal, D., 2016) showed that this guarantee can be efficiently obtained through the novel reduction to coin-betting in Theorem 1. Later, these kind of regret guarantees were improved to depend on the sum of the squared losses rather than on time, but with an additional factor, in the Squint algorithm (Koolen, W. M. and van Erven, T., 2015). It is worth noting that the Squint algorithm can be interpreted exactly as a coin-betting algorithm plus the reduction in Theorem 1.
The betting strategy in (6) and (7) are new, and derived from the shifted-KT potentials in (Orabona, F. and Pal, D., 2016). The guarantee is the same obtained by the shifted-KT potentials, but the analysis can be done without knowing the properties of the gamma function.
Exercise 1 Using the same proof technique in the lecture, find a betting strategy whose wealth depends on rather than on .