Central limit theorem

19 minute read

Published: July 03, 2019

In the previous entries, we have explored the behavior of the sums $S_n = X_1+\dots+X_n$ for an iid sequence ${ X_n }$. The weak and strong laws of large numbers show that asymptotically, we have $S_n \sim n\cdot \mathbb{E}(X_1)$ if $\mathbb{E}(X_1) \lt \infty$. In this entry, we explore the behavior of the fluctuations of the sums around their expected limit. More precisely, we prove the Central Limit Theorem (CLT):

Theorem (CLT): let ${X_n }$ be an iid sequence of random variables with finite mean $\mu$ and finite non-zero variance $\sigma^2$, and denote $S_n = X_1 + \dots + X_n$. Then

\[\dfrac{S_n - n\mu}{\sqrt{n\sigma^2}} \Rightarrow X_0,\]

where $\Rightarrow$ means convergence in distribution and $X_0 \sim \mathcal{N}(0,1)$. Recall that the density for a normally distributed random variable with mean $\mu$ and variance $\sigma^2$ is given by

\[f_\mathcal{N}(s) = \dfrac{1}{\sqrt{2\pi\sigma^2}} e^{-(t-\mu)^2/2\sigma^2}dt.\]

A few words about the interpretation of this result: the strong law of large numbers says that almost surely, we have the asymptotic relation $S_n \approx n\mu$. The CLT refines further this approximation: if we subtract $n\mu$ from $S_n$, then the remaining term has approximately the distribution of $\sqrt{n}\mathcal{N}(0,\sigma^2)$. This gives a precise description of the random fluctuations of $S_n$ around the asymptotic we expect it to have.

Our main tool to prove the CLT is the characteristic function of a random variable. At first it may seem just like a way of writing the same information, but it turns out that it will allow us to use powerful tools from analysis to draw conclusions at the probability theory level.

Let $X$ be a random variable on $\Omega$. Its characteristic function $\varphi_X\colon \mathbb{R}\to \mathbb{C}$ is defined by

\[\varphi_X(t) = \mathbb{E}(e^{itX}) = \int_\mathbb{R} e^{its} dF_X(s),\]

where $F_X(s) = \mathbb{P}(X\leq s)$. If $X$ admits a density $f_X$, then its characteristic function can be expressed as

\[\varphi_X(t) = \int_\mathbb{R} e^{its} f_X(s) ds.\]

Note that this corresponds to the inverse Fourier transform of the density.

Example: the characteristic function of a normally distributed random variable $X\sim \mathcal{N}(\mu,\sigma^2)$ is given by $\varphi_X(t) = e^{i t \mu-\frac{1}{2} \sigma^{2} t^{2}}$.

We state now some properties of characteristic functions:

Properties:

$\varphi_X$ exists for all real random variables $X$,
$\varphi_X (0) = 1$,
$\varphi_X$ is uniformly continuous.

The properties follow easily from the definitions. We state the next property separately as it is of great importance for the proof of the CLT.

Proposition: suppose that $X$ has $n$ finite moments. Then $\varphi_X$ is $n$ times differentiable, and its derivatives are given by

\[\varphi_X^{(k)}(t) = \int_{\mathbb{R}} s^k e^{its} dF_X(s)\]

for $k = 1,\dots, n$.

Remark: the converse is also true: if $\varphi_X$ is differentiable $n$ times, then $X$ has $n$ finite moments.

Proof: Note that

\[\dfrac{\varphi_X(t+h) - \varphi_X(t)}{h} = \int_\mathbb{R} e^{its} \cdot \dfrac{e^{ihs}-1}{h} dF_X(s).\]

The inequality $|e^{ihs} - 1 | \leq | hs|$ allows us to use the dominated convergence theorem to take the limit inside the integral, from where we obtain

\[\varphi\prime_X(t) = i \int_\mathbb{R} s e^{its} dF_X(s).\]

The general formula follows by induction.$\square$

The next property establishes a relationship between sums of independent random variables and their characteristic functions.

Proposition: let $X_1,\dots, X_n$ be independent random variables with characteristic functions $\varphi_1,\dots, \varphi_n$. Then the characteristic function of $S_n = X_1 + \dots + X_n$ is given by $\varphi_1\cdot \dots \cdot \varphi_n$.

Proof: this follows from the properties of the expectation.$\square$

Finally, we state the main ingredient of the proof of the CLT: Levy’s continuity theorem. It is here where all the power of analysis comes into play.

Theorem: Suppose ${X_n }$ is a sequence of random variables with characteristic functions ${ \varphi_n }$, converging pointwise to a characteristic function $\varphi_0$ of a random variable $X_0$. Then ${X_n }$ converges in distribution to $X_0$.

Remark: there are far more general versions of the previous theorem, but this will suffice to prove the CLT.

Now we are ready to prove the CLT: first, by centering and normalizing, we can assume that $\mu = 0$ and $\sigma^2 = 1$. Let $\varphi$ the characteristic function common to all variables $X_n$. Then, the characteristic function of $S_n$ is given by $\varphi_{S_n} = (\varphi_X)^n$. Since we assumed that the random variables have finite second moments, their characteristic function is twice differentiable and $\varphi’(0) = 0$, $\varphi’‘(0) = 1$. Hence by Taylor’s theorem, we have

\[\varphi(t) = 1 - \dfrac{t^2}{2} + o(t^2),\]

for $t$ close to $0$. This implies that

\[\log \varphi_{S_n}\left(\dfrac{t}{\sqrt{n}}\right) = n\log\left(1 - \dfrac{t^2}{2n} + o\left(\dfrac{t^2}{n} \right) \right) = n\log\left( - \dfrac{t^2}{2n} + o\left(\dfrac{t^2}{n} \right) \right) \to -\dfrac{t^2}{2}\]

and consequently, $\varphi_{S_n}\left(\dfrac{t}{\sqrt{n}}\right) \to e^{t^2/2}$, which is the characteristic function of a normal random variable with zero mean and unit variance. By Levy’s continuity theorem, we have that $S_n/\sqrt{n} \Rightarrow \mathcal{N}(0,1)$, finishing the proof.

Share on

Twitter Facebook LinkedIn

Felipe Pérez

Central limit theorem

Share on

Leave a Comment

You May Also Enjoy

Your SQL is probably WRONG

Is overfitting… good?

PCA and supervised learning

Martingales 0