Statistical properties in hyperbolic dynamics

Published:

This is the first post of a series of 4 posts based on the lectures at the Houston Summer School on Dynamical Systems 2019 on Statistical properties in hyperbolic dynamics, given by Matthew Nicol, Andrew Török and William Ott. These notes are heavily edited with added comments, examples and explanations. Any mistake is of my responsibility. Some of the notation has been changed for consistency.

Lecture 1

The goal of these lectures is to understand statistical properties of dynamical systems by using some tools from probability theory. For this, we will introduce first some language and notation.

Random variables

Let $(\Omega,P)$ be a probability space. A random variable is a measurable funcion $Z\colon\Omega\to\mathbb{R}$ (in the context of dynamical systems, we will call random variables observables). A stochastic process is a sequence of random variables $(Z_k)$ in the same probability space. The distribution function of a random variable $\phi$ is the function $F_Z\colon(-\infty,\infty)\to[0,1]$ given by

$F_Z(t) = \mathbb{P}(Z\leq t)$

Example: a random variable $Z$ has normal distribution (denoted by $Z\sim \mathcal{N}(\mu,\sigma^2)$) if

$F_Z(t) = \dfrac{1}{\sqrt{2\pi\sigma^2}} \int_{-\infty}^t \exp\left(\frac{-(x-\mu)^2}{2\sigma^2}\right)dx.$

A stochastic process $(Z_k)$ is stationary if for every $n\geq 0$ and every sequence $A_1,\dots,A_k$ of measurable sets, we have

$P(Z_1\in A_1,\dots,Z_k\in A_k) = P(Z\in A_{1+n},\dots,Z_k\in A_{k+n}).$

This means that the joint distribution of the random variables does not change over time.

Dynamical systems

We turn now to our main interest: dynamical systems. Suppose $T\colon X\to X$ is a probability measure preserving map of the probability space $(X,\mu)$, that is $\mu(T^{-1}(A)=\mu(A)$ for all measurable sets $A$.

Let $\phi$ be an observable. If we iterate the map, we can generate a time series ${X_n= \phi\circ T^n}$. If $T$ is measure preserving, then this time series is a stationary stochastic process. In particular, the $X_n$ are identically distributed, but they are not necessarily independent. Recall $X_1,X_2$ are independent if

$P(X_1\in A_1 , X_2\in A_2) = P(X_1\in A_1)P(X_2\in A_2).$

What statistical properties does the time series $(X_n)$ posses?

One of the main results of Ergodic Theory, is the well celebrated Birkhoff’s Ergodic Theory, which generalizes the probabilistic Law of Large Numbers. Recall the system $T$ is ergodic with respect to the measure $\mu$ if $T^{-1}A = A$ implies that $\mu(A) = 0$ or $\mu(A) = 1$.

Birkhoff’s Ergodic theorem: if $T$ is ergodic and $\phi$ is integrable, then for $\mu$ almost every $x$

$\dfrac{1}{n}\sum_{j=0}^{n-1}\phi\circ T^j(x) \to \int_X \phi d\mu.$

We can see here how the Ergodic theorem generalizes the LLN.

Once we know the almost sure behavior of the averages of the time series, we can ask for a second order limit law, namely, a Central Limit Theorem. Recall its formulation in the probabilistic setting:

Central limit theorem: if $(X_n)$ are iid and $\mathbb{E}(X_1^2)<\infty$, then

$\DeclareMathOperator{\var}{var} \dfrac{\sum_{j=1}^n X_j -n\mathbb{E}(X_1)}{\sqrt{n\var(X_j)}} \Longrightarrow \mathcal{N}(0,1).$

The content of the CLT is to give a precise description of the distribution and size of the fluctuations of the averages around their almost sure limit (given by the LLN).

If $T$ is ‘chaotic’ (hyperbolic) we expect $\phi\circ T^k$ to be roughly independent of $\phi$ for large $k$. We quantify this using correlations:

For two observables $\phi,\psi: X\to \mathbb{R}$, we define their correlation by

$C_{\phi,\psi}(n) = \int_X \phi\cdot \psi\circ T^n d\mu - \int_X \phi d\mu \int_X\psi d\mu.$

A special case of interest in the area of Time Series is the autocorrelation: take $\psi = \phi$ and we obtain $C_{\phi,\phi}(n)$.

We can measure the decay of correlations with respect to any measure, not necessarily an invariant measure: if $\nu$ is any other measure, the correlation function of $\phi$ and $\psi$ with respect to $\nu$ is defined by

$C^{\nu}_{\phi,\psi}(n) = \int_X \phi\cdot \psi\circ T^n d\nu - \int_X \phi d\nu \int_X\psi d\nu.$

In general, if we want to get results on the decay of correlations, we need to assume some regularity of the observables, usually $\phi,\psi$ in some ‘regular’ Banach space: $\phi\in\mathcal{B}_{\alpha},\psi\in\mathcal{B}_{\beta}$, where we can take $\mathcal{B}_{\alpha} = L^1$, $\mathcal{B}_{\beta} = L^\infty$, etc.

The three main techniques used to prove decay of correlations are the following:

1. Transfer operator
2. Cone method, using Hilbert metric
3. Coupling

In this lecture and the next we will explore the first one, namely, the transfer operator method.

Transfer operator

This technique is inspired by the Perron-Frobenius operator theory in Markov chains. In fact, some of the theorems from this theory can be generalized if we think of symbolic dynamics as an infinte-range interaction Markovian theory (see David Ruelle’s Thermodynamic Formalism here for the finite symbol alphabet and Omri Sarig’s Thermodynamic Formalism for Countable Markov Shifts here for the infinite alphabet case).

Suppose $T\colon X\to X$ is non-singular with respect to a probability measure $m$, that is, $m(T^{-1}(A)) = 0$ iff $m(A) = 0$ for any measurable set $A\subset X$. The Koopman operator $U\colon L^\infty\to L^\infty$ is defined by $U\phi = \phi\circ T$. By the Riesz representation theorem, there exists a linear operator $P\colon L^1\to L^1$ satisfying

$\int_X \phi\cdot \psi\circ T dm = \int_X P\phi \psi dm$

for all $\phi \in L^1(m),\psi\in L^\infty(m)$.

Example: $T$ is a piecewise $\mathcal{C}^2$ expanding map of $[0,1]$, then we can explicitly calculate the transfer operator with respect to the Lebesgue measure by

$P\phi(x) = \sum_{y\in T^{-1}(x)} \dfrac{\phi(y)}{|T'(y)|}.$

If $T(x) = 2x \pmod 1$ and $\phi$ is Lipschitz,

$P\phi(x) =\dfrac{1}{2}(\phi(x_1)+\phi(x_2))$

where $x_1,x_2\in T^{-1}(x)$. We can look at the iterates of the transfer operator:

$P^2\phi(x) = \dfrac{1}{4}(\phi(x_1)+\dots+\phi(x_4))$

where $x_1,\dots,x_4\in T^{-2}(x)$. Thus, $P$ is averaging over preimages. This, together with the uniform hyperbolicity of the doubling map (or in general, uniformly hyperbolic systems) will be crucial in what follows.

Some general properties of the transfer operator:

1. $P$ is linear: $P(\alpha\phi+\psi)=\alpha P(\phi) + P(\psi)$.
2. $P$ is positive: $P\phi\geq 0$ for $\phi\geq 0$.
3. $P$ is bounded in $L^1$: $|P\phi |_{L^1(m)} \leq | \phi|_{L^1(m)}$.

Proposition: if $P$ is the transfer operator with respect to a reference measure $m$, then $d\mu = hdm$ (or equivalently $h = \frac{d\mu}{dm}$) is $T$-invariant if and only if $Ph = h$.

Proof: observe that for a measurable set $A\subset X$,

$\mu(A) = \int_X 1_A d\mu$

and similarly

$\int_X 1_A\circ T d\mu = \mu(T^{-1}(A)),$

since $1_A\circ T = 1_{T^{-1}A}$. Then by duality

$\int_X 1_A\circ T h dm = \int_X 1_A Ph dm = \int_X 1_A h dm = \int_X d\mu.$

The other direction follows from the same calculation, using uniqueness of the dual operator $\square$.

In what follows, we will assume properties of the transfer operator and derive statistical properties. Suppose $\phi\in\mathcal{B}_{\alpha}$ and by simplicity, $\int \phi d\mu = 0$.

Theorem: suppose that the norm of the Banach space $\mathcal{B}_{\alpha}$ is such that $|\phi |_{L^1}\leq | \phi|_{\mathcal{B}_{\alpha}}$ (for instance the Lipschitz functions), and suppose that we also have $| P^n\phi |_{\mathcal{B}_{\alpha}}\leq r(n)| \phi |_{\mathcal{B}_{\alpha}}$, where the sequence $r(n)$ converges to zero. If $\psi\in L^\infty$ and $\phi\in\mathcal{B}_{\alpha}$, we have decay of correlations with rate $r(n)$.

Proof: observe that $\left| \int_X \phi\cdot \psi\circ T^n d\mu \right| = \left| \int_X P^n\phi \cdot \psi d\mu\right| \leq |P^n\phi |_{L^1} | \psi |_{L^\infty}\leq r(n)|\psi |_{L^\infty} | \phi |_{\mathcal{B}_\alpha}.\square$

Example: for the doubling map, take $\mathcal{B}_{\alpha} = \text{Lipschitz functions}$, and the semi-norm

$|\phi|_{Lip} = \sup_{x\neq y}\dfrac{|\phi(x)-\phi(y)|}{|x-y|}.$

This semi-norm can easily be turned into a norm by adding the $\mathcal{C}^0$ norm to it, but for the estimates this is not relevant. Then for $\phi$ Lipschitz with $\int_X \phi dm =0$, one has $|\phi |_{L^1} \leq |\phi|_{Lip}$: since $\int_X \phi dm =0$, there is a point $x_0$ such that $\phi(x_0) = 0$. This implies that

$|\phi|_{L^1} = \int_0^1 |\phi(x) - \phi(x_0)| dx \leq \int_0^1 |\phi|_{Lip}|x-x_0| dx \leq |\phi|_{Lip}.$

The Lipschitz semi-norm of the action of the transfer operator can be bounded by

$\dfrac{|P\phi(x)-P\phi(y)|}{|x-y|}\leq \dfrac{\frac{1}{2}|\phi(\frac{x}{2})-\phi(\frac{y}{2})|+\frac{1}{2}|\phi(\frac{x+1}{2}) - \phi(\frac{y+1}{2})|}{|x-y|}$ $\leq \dfrac{1}{4} \left( \frac{|\phi(\frac{x}{2})-\phi(\frac{y}{2})|}{|x-y|} + \frac{|\phi(\frac{x+1}{2})-\phi(\frac{y+1}{2})|}{|x-y|} \right) \leq \dfrac{1}{2}|\phi |_{Lip},$

which implies that $| P\phi|_{Lip}\leq \frac{1}{2}|\phi |_{Lip}$. This computation can be done for the iterates of the transfer operator, obtaining the bound for the previous theorem with exponential rate of decay.

Next post