where is some constant that depends on the bounds on the gradient and the bound on the price . Alternatively, we can write this as, on average, the longer the game continues, the better we perform:
(Again, it's worth reiterating: this is done even in the presence of adversarially chosen .) Indeed, after a long enough time scale, we see that our strategy and the best fixed strategy, with complete knowledge of the future, are, on average, about the same. Algorithms where the average regret vanishes have a bit of a silly name, but I'll put it here for anthropological reasons: they are called no-regret algorithms.
Anyways, the proof is fairly easy. The first order of events is to note that, since is convex, then, by definition, we have
for any . (The are chosen in the same way as the previous section.) This means, letting ,
Summing this and noting it is true for any , then, certainly, it is true for the optimal strategy, , so
We will focus our attention on trying to show this last term, which we call the linear bound, is 'small'.
The inequality is a one-liner and follows easily from the fact that
where , using the definition of . (To see this, expand the right hand side and cancel terms.) Finally, the middle term is nonpositive, as it is a negative square, so we get the bound
Since we know that and by assumption, then
Finally, choosing , which minimizes the right hand side, gives
as required.
Ok, fine, I'll explain it.
From before, we can write in terms of only the gradients :
so the first term of the linear bound can be written
This last double sum should look familiar if you've ever expanded the squared norm of a sum before, but if you have not then:
so, rearranging,
Plugging this back into the linear bound, we have
Here comes the only interesting part of the proof. (Though, honestly, it's not even that interesting.) Note that we can pull a cute sleight of hand: write then, we can write the above as
We can rewrite the last two terms in the following way:
(To see this, expand the middle term!) This is exactly the term we found in the previous part!
I mean, there isn't much interesting here and really not very much magic. The one thing this brings up is: what does a 'clean but general' version of this proof look like? In particular, the only important part of the proof is recognizing that the interaction between and is, in some sense, bounded by the norm of and the individual norms of . It feels like there should be some simple construction which allows this interaction to be bounded in a more natural way. (Of course, this will play in a variety of ways with how we choose based on the gradients and such.)
I should note that this bound is tight for any algorithm in that only the constant can be improved. (In particular, it is possible to construct a stochastic adversary that always achieves at least regret, for some , no matter how is chosen.) Indeed, this question is deeply related to the previous post on finding a lower bound for a random walk, but I won't go into it here.