or that the curvature of the KL-divergence around θ∗ is the Fisher information! In this case, it shouldn't be entirely surprising that there is some connection between how well we can measure a parameter's value and the Fisher information of that parameter, since the likelihood's noise around that parameter is given by the Fisher information.
On the other hand, I haven't been able to find a direct proof of the above bound (or, even, any other nice bounds) given only the above observation. So, while the connection might make sense, it turns out the proof of the Cramér-Rao bound uses a slightly different technique, which I will present later (along with some other fun results!).
In probability. I.e., the probability that the empirical mean differs from the expectation by some amount, n1∑iYi−E[Y1]>ε, goes to zero as n↑∞. A simple proof in the finite-variance case follows from Chebyshev's inequality (exercise for the reader!).
In fact, the logarithm is strictly monotonic, so it preserves minima uniquely. In other words, for any function ϕ:S→R>0, ϕ and log∘ϕ have minima and maxima at exactly the same points.
I am, of course, being sneaky: the subtraction happens to work since this just happens to yield the KL-divergence in expectation—but that's how it goes. Additionally, the requirement really is not that θ=θ∗, but rather that p(x∣θ∗)=p(x∣θ), just in case there happen to be multiple hypotheses with equivalent distributions. Since you're reading this then just assume throughout that p(⋅∣θ)=p(⋅∣θ∗) on some set with nonzero probability (in the base distribution) whenever θ=θ∗.
While it may seem that there should be easy bounds to give immediately based on this proof, the problem is that we do not have good control of the second moment of log(p(⋅∣θ∗)/p(⋅∣θ)) (this quantity may not even converge in a nice way). This makes giving any kind of convergence rate quite difficult, since the proof of the weak law given only a first-moment guarantee uses the dominated convergence theorem to give non-constructive bounds.