We will continue with black-box reductions, this time solving the problem of sleeping experts.
1. Sleeping Experts
Consider now the setting of learning with experts where only a subset of the experts is active in each round. In particular we have that is expert
is active at time
, and
if the expert is inactive, that is, sleeping. This setting is useful in the case that some experts might become unavailable in some rounds, but also to model the case that the set of experts is growing over time.
The online algorithm receives at the beginning of each round the information about which experts will be active and it can pick only one of the active experts. So, we constrain the learning with experts algorithm to produce a probability distribution only over the active experts. So, we will use a different notion of regret, in particular, our regret with respect to the expert is defined as
In other words, we measure the regret against expert only on rounds where the expert was active. This notion can be easily generalized to an arbitrary convex combination
of experts as
In the case that all the experts are active on all rounds, this notion recovers the usual one. However, in the general case, this is a different notion than the one we used in Online Linear Optimization.
For the reduction, we need a way to transform a vector into a vector in the simplex defined by the active experts, and we also need appropriate surrogate losses. The reduction will be similar to the one we used last time. For the first part, we construct a probability distribution as
For the second part, from the original linear losses we construct the modified losses
, where
. The overall algorithm is in Algorithm 1.
With the above definitions, we have
In turn, this implies that
In words, we can construct surrogate losses to transform the sleeping expert problem in an OLO problem over , obtaining that
The norm of the surrogate losses is controlled because .
Remark 1. Clearly, if we have an algorithm for learning with experts, we can use it in the reduction because
.
Remark 2. The above definitions and reduction generalize to the setting that
, denoting the confidence that the expert has in their prediction, where
means that the expert has no confidence and it abstains from producing a prediction.
Example 1. Consider the learning with sleeping experts setting with linear losses
such that
. Let’s design a series of reductions to easily solve this problem: this is a good exercise to show how easy is to combine online learning algorithms and reductions as LEGO blocks.
Consider to run a 1d coin-betting algorithm to solve online linear optimization in
. For example, we can use the KT algorithm, where we ignore the rounds where the gradients are zero:
Use a black-box reduction from last time to constraint it to
, we have
and obtain a regret of
where
is the number of times that
. Use it to produce an algorithm over
, using each 1d algorithm on each coordinate. Obtaining
. Finally, set
and a sleeping expert reduction from
:
Moreover, given that
implies
, the upper bound on the regret of the final algorithm against any expert
is
, that depends on the number of rounds that the expert
was active, rather than on the total number of rounds.
2. History Bits
The setting of sleeping experts has been proposed by Blum (1997) and Freund, Schapire, Singer, and Warmuth (1997). The reduction above is an extension of the one from Gaillard, Stoltz, and Van Erven (2014) that was designed to reduce the sleeping experts problem to the learning with experts problem rather than OLO in . I also added the minor improvement of removing the constant term from the surrogate losses. Such reduction is implicitly used by Luo&Schapire (2015) and by Jun, Orabona, Wright, and Willett (2017), Jun, Orabona, Wright, and Willett (2017). The variant of KT in Example 1 that updates only when
appeared for the first time in Chen, Langford, and Orabona (2022).
3. Exercises
Exercise 1. Prove that the variant of KT that does not update on rounds where
have the stated regret in Example 1.
