Large deviations

12 minute read

Published: July 15, 2019

In the previous entries we have studied the asymptotic behavior of the sums of iid random variables. The law of large numbers showed that almost surely, the averages converge to the mean, while the central limit theorem gave us a second order approximation, that is, it provided information of the size of the fluctuations around the expected value. In this entry, we discuss a third order result, namely, Cramer’s theorem on large deviations. Intuitively, Cramer’s theorem establishes asymptotic rates for the decay of the probability of observing very unlikely events.

Let $\{X_n \}$ be a sequence of iid random variables with mean $\mu$, and suppose that the variables have finite log-moment generating function

\[\varphi(\lambda) = \log \mathbb{E}e^{\lambda X_1} \lt \infty\]

for all $\lambda\in\mathbb{R}$.

Theorem (Cramer): Let $S_n = X_1 + \dots + X_n$ and $x > \mu$. Then

\[\lim_{n\to\infty}\dfrac{1}{n} \log \mathbb{P}\{ S_n \geq nx\} = -\varphi^*(x),\]

where $\varphi^*$ is the Legendre transform of $\varphi$.

Recall that the Legendre transform of $\varphi$ is given by

\[\varphi^*(x) = \sup_{\lambda\in\mathbb{R}} \{\lambda x -\varphi(\lambda) \}.\]

We will prove the theorem in two parts: first the upper bound, which uses a standard Chernoff bound technique, then the lower bound, which uses a measure tilting argument.

Proof: (Upper bound). For $\lambda \geq 0$, we have by Markov’s inequality

\[\mathbb{P}\{ S_n \geq nx \} = \mathbb{P}\{ e^{\lambda S_n} \geq e^{\lambda nx} \} \leq e^{-\lambda nx} \mathbb{E}(e^{\lambda S_n}).\]

This implies that

\[\dfrac{1}{n} \log \mathbb{P}\{\frac{1}{n} S_{n} \geq x\} \leq -\lambda x + \dfrac{1}{n} \log \mathbb{E}(e^{\lambda S_n}) = -\lambda x + \varphi(\lambda)\]

and consequently,

\[\limsup_{n \to \infty} \dfrac{1}{n} \log \mathbb{P}\{\frac{1}{n} S_{n} \geq x\} \leq - \sup_{\lambda\geq 0}\{ \lambda x -\varphi(\lambda) \}.\]

By Hölder’s inequality, $\varphi$ is convex with $\varphi’(0) = \mu < x$. Then the expression in the brackets is negative for for $\lambda < 0$, and so we can take the supremum over all real $\lambda$.

(Lower bound). Fix $\epsilon > 0$. We will change the law of the random variables $X$ so that the event $\{ S_n \geq nx \}$ is not rare under the new law, and we can use the old limit laws on it. If $P$ is the law of $X$, consider the new law $Q$ given by

\[dQ(X) = e^{-\varphi(\lambda)+ \lambda X} dP(X).\]

Then we have that for iid $X_1,\dots,X_n$,

\[d\mathbb{Q} = dQ\otimes \dots \otimes dQ = e^{-n\varphi(X)+\lambda S_n}d\mathbb{P}(X_1,\dots,X_n).\]

Note now that

\[\varphi'(\lambda) = e^{-\varphi(\lambda)}\mathbb{E}(Xe^{\lambda X}) = \mathbb{E}_Q(X).\]

If $\mu < x < \operatorname{esssup}X $, we can find $\lambda_*$ such that $\varphi’(\lambda_*) = x + \frac{\epsilon}{2}$. This implies that

\[\mathbb{Q}\{ x + \epsilon > \dfrac{1}{n}S_n \geq x \} \to 1,\]

as $\mathbb{E}_Q(X) = x + \frac{\epsilon}{2}$. With this we can see that

\[\begin{align} \liminf_{n\to\infty} \dfrac{1}{n}\log \mathbb{P}\{\frac{1}{n} S_{n} \geq x\} \geq \liminf_{n\to\infty} \dfrac{1}{n}\log \mathbb{P}\{x + \frac{\epsilon}{2} \frac{1}{n} S_{n} \geq x\}\nonumber \\ = \liminf_{n\to\infty} \dfrac{1}{n}\log \mathbb{E}_{\mathbb{Q}}( e^{-n\varphi(\lambda) - \lambda S_n} 1\{ x + \frac{\epsilon}{2} \frac{1}{n} S_{n} \geq x \} ) \nonumber \\ \geq \varphi(\lambda) - \lambda(x+\epsilon) + \liminf_{n\to\infty} \dfrac{1}{n}\log\mathbb{Q}\{ x + \epsilon > \dfrac{1}{n}S_n \geq x \}\nonumber \\ = \varphi(\lambda) - \lambda(x+\epsilon) \nonumber \\ \geq -\varphi^* (x+\epsilon) \end{align}\]

from which the result follows from letting $\epsilon \to 0$.

Share on

Twitter Facebook LinkedIn

Felipe Pérez

Large deviations

Share on

Leave a Comment

You May Also Enjoy

Your SQL is probably WRONG

Is overfitting… good?

PCA and supervised learning

Martingales 0