Limit laws: weak law of large numbers

Published:

This is the first of a series of entries where we will explore several limit laws for sequences of random variables. The setting is going to be the standard for probability theory: fix a measure space $\Omega$, a sigma-algebra $\mathcal{B}$ and a probability measure $\mathbb{P}$. By random variables we mean measurable functions $X\colon \Omega \to \mathbb{R}$. The distribution of $X$ is defined as $F(x) = \mathbb{P}(\omega: X(\omega) \leq x )$, so $\mathbb{P}( a \leq X \leq b ) = F(b) - F(a)$. If the measure $X^{-1}\mathbb{P}(A) = \mathbb{P}(X^{-1}(A))$ is absolutely continuous with respect to $\mathbb{P}$, we denote the Radon-Nikodym derivative by $f = \frac{\mathrm{d} X^{-1}\mathbb{P}}{\mathrm{d} \mathbb{P}}$ and hence we can compute probabilities as integrals of this function with respect to the measure $\mathbb{P}$:

$\mathbb{P}(X \in A) = \int_A f(x) \mathrm{d}\operatorname{Leb}(x).$

We call $f$ the density of the random variable $X$. The expectation (mean) of $X$ is defined by $\mu = \mathbb{E}(X) = \int_\Omega X \mathrm{d}\mathbb{P} = \int_{\mathbb{R}} x \mathrm{d} F(x)$ where the second integral is in the sense of Lebesgue-Stieltjes, and the variance by $\sigma^2 = \int_\omega X^2 \mathrm{d}\mathbb{P} - (\mathbb{E}(X))^2 = \mathbb{E}(X-\mu)^2$.

Now we suppose we have a sequence of iid random variables $X_1,X_2,\dots$ with distribution $F$. We are interested in describing the limit behavior of the averages

$\dfrac{S_n}{n} = \dfrac{X_1+\dots + X_n}{n},$

where $S_n = X_1+\dots +X_n$. Intuitively, the averaging of the independent observations of the same distribution should smooth out and regularize the random fluctuations of the random variables around the mean. This is the content of the weak law of large numbers:

Theorem (Weak law of large numbers): Assume that the iid sequence $(X_n)$ has finite variance. Then $(X_n)$ converges in distribution to $\mu$.

Recall that convergence in distribution means that for any $\varepsilon > 0$,

$\mathbb{P}\left(\left|\frac{S_n}{n}-\mu\right| > \varepsilon\right) \to 0$

as $n\to \infty$.

This result can be interpreted in the following way: we fix a precision threshold $\varepsilon$, and as we take more measurements of the same distributions, the probability that the average is close to the mean within $\varepsilon$ precision becomes arbitrarily small.

We have formulated here the result in a slightly more restricted formulation, as it hold under more general conditions where the variance may not necessarily exist, but in that case the proof is more involved. When the variance is finite, the result can be proven using a concentration inequality:

Theorem (Chebyshev’s inequality): Assume that the random variable $X$ has finite variance $\sigma^2$ and mean $\mu$. Then for every positive number $\varepsilon > 0$ we have that

$\mathbb{P}(|X-\mu|\geq \varepsilon) \leq \dfrac{\sigma^2}{\varepsilon^2}.$

This inequality follows immediately from another concentration inequality:

Theorem (Markov’s inequality): Assume that the non-negative random variable $X$ has finite mean $\mu$. Then for every positive number $a>0$ we have that

$\mathbb{P}(X\geq a) \leq \dfrac{\mu}{a}.$

Proof:

Let $I_{(X\geq a)}$ the indicator function of the set $(X\geq a)$. Then

$a\cdot I_{(X\geq a)} \leq X$

almost surely. Taking expectations we get

$\mathbb{P}(X\geq a) \leq \dfrac{\mu}{a}$

as we wanted.$\square$

The proof of Chebyshev’s inequality follows immediately by using the random variable $(X-\mu)^2$ and the threshold $\varepsilon^2$. Now the weak law of large numbers follows easily from Chebyshev’s inequality if we note that the variance of $S_n/n$ is equal to $\sigma^2/n$. Then, given $\delta > 0$, take $n_0$ such that $n_0\geq \sigma^2/(\delta\varepsilon^2)$, then

$\mathbb{P}\left(\left|\frac{S_n}{n}-\mu\right| > \varepsilon\right) \leq \dfrac{\sigma^2}{n\varepsilon^2} \leq \delta$

for all $n\geq n_0$, which proves the statement of the WLLN. $\square$

Although this version of the WLLN is not the most general, it is interesting because it shows how concentration inequalities can be used to derive limit laws for sums.