Martingales 0

37 minute read

Published: February 20, 2020

I want to talk about martingales, but unfortunately in order to do that properly, we need to talk first about sigma-algebras and conditional expectations, subjects which can be a bit harsh at first. These concepts are essential, and while we could just work with them just as formal objects with certain properties, it is fundamental to have a deeper understanding of them so we do not get lost in formalism and we are able to capture the intuition behind this theory.

If you like my content, consider following my linkedIn page to stay updated.

Sigma-algebras and information

In this section, I will try to give some intuition for what the measure-theoretic conditional expectation is, as it is one of the central objects of probability theory, and in particular, martingale theory.

Let $(\Omega,\mathcal{F},\mathbb{P})$ be a probability space, and $X$ a random variable on $\Omega$. We can think of the sigma-algebra $\mathcal{F}$ as all the events that are somehow accesible for us to measure. For instance, the question did the rv $X$ take a positive value? can be represented with the event $\{ \omega\in \Omega : X(\omega) > 0\}$, which by definition of random variables, is an element of $\mathcal{F}$. In general most reasonable questions about what our random variable is doing can be represented by an element of $\mathcal{F}$. To each of these, we assign them a probability using $\mathbb{P}$. We can ask then for questions such as what is the probability that $X$ takes a positive value? by computing $\mathbb{P}\{ \omega\in \Omega : X(\omega) > 0\}$.

One caveat of the previous discussion, is that normally we do not have all the information of $\mathcal{F}$ available for us, we only have a subset of it. In other words, we normally have a sub sigma-algebra $\mathcal{G}$ of $\mathcal{F}$. If we have no information at all, then $\mathcal{G} = \{ \Omega,\emptyset\}$. In order to gain more information about the events of our space, we can use random variable (or observables of our space). If $X$ is a random variable that we can measure (let us say for instance, the energy of our space), then we can use it to extract information by looking at events of the form $X^{-1}(A)$ for $A\in\mathcal{F}$. The set of all these events is a sigma-algebra which we denote $\sigma(X)$. This sigma-algebra contains essentially all the information that can be extracted using measurements of $X$.

Normally, we do not really have all the values of $X$ so making any kind of inference is not a trivial task. Let us say for instance that our space $\Omega$ corresponds to the box $\{ (x,y) : -1\leq x,y\leq y \}$ and $X$ is the temperature at each point of the box (we will not discuss in too much details the sigma-algebra here). If we do not know the exact values of $X$ at every point but we still need to report some information about the temperature of the box, then we can look at the average of $X$ over the box, or

\[\mathbb{E}(X) = \int_\Omega X d\mathbb{P}.\]

While this number does not provide detailed information about the distribution of the temperature (as knowing $X$ would do), it is still a decent summary about what is going on in the box. If we take a random point $p$ of the box, then our best guess for the temperature at that point would be $\mathbb{E}(X)$ (of course there are multiple situations where this is not true, but let us keep it simple for now). Suppose now that we partition the box into two different sides $A = \{ (x,y) \in \Omega : x \geq 0\},B =A^c= \{ (x,y) \in \Omega : x < 0\}$, and measure the average temperature in each side, that is,

\[\mathbb{E}_A(X) = \int_{A} X d\mathbb{P} \quad , \quad \mathbb{E}_B(X) = \int_{B} X d\mathbb{P} .\]

If we are given a point $p$ of the box $\Omega$, what temperature would we the best guess for this new point? We can now give a more informed guess, by saying $\mathbb{E}_A(X)$ if $p\in A$ and $\mathbb{E}_B(X)$ if $p\in B$.

This improvement comes from the fact that we were able to make two measurements using $X$: the average temperature in $A$ and in $B$. In general, we gained information from performing such measurements. Technically, we count with the information of the sigma-algebra $\mathcal{G} = \{ \emptyset,A,B,\Omega\}\subset \mathcal{F}$. If we obtain information of bigger sigma-algebras, our estimate for $X$ will become more precise.

It is important to remark the following measurability issues: the fact that $X$ is a random variable means that $X$ is $\mathcal{F}$-measurable, that is, for any Borel set $E\subset \mathbb{R}$, the set $X^{-1}(E) = \{ \omega : X(\omega) \in E\}$ is an element of $\mathcal{F}$. If we only have at our disposition the information of the sigma-algebra $\mathcal{G}$, we will not be able to measure the events $X^{-1}(E)$ for arbitrary choices of $E$. In other words, $X$ is not necessarily $\mathcal{G}$-measurable.

Conditional expectation

The process we described in the previous section is a particular instance of a more general construction: the conditional expectation. Suppose that $\mathcal{G}$ is a sub sigma-algebra of $\mathcal{F}$ and that $X$ is a random variable with finite expectation. In the last paragraph of the previous section, we discussed the issue of measurability of our random variables. In practical terms, this means that we cannot answer questions such as what is the probability that $X$ is between $4$ and $5$? or other more general questions, using only the information of $\mathcal{G}$. What we can do to solve this problem, is to come up with a simplified version $X_\mathcal{G}$ of $X$ which is adapted to $\mathcal{G}$, in the sense that we can answer most questions about it using our knowledge of the sub sigma-algebra $\mathcal{G}$.

We say that a $\mathcal{G}$-measurable function $X_\mathcal{G}$ is a conditional expectation of $X$ with respect to the sub sigma-algebra $\mathcal{G}$ if

\[\int_G X_\mathcal{G} d\mathbb{P} = \int_G X d\mathbb{P}\]

for all $G\in\mathcal{G}$. We denote such function by $\mathbb{E}(X|\mathcal{G})$.

Note that the property of conditional expectations essentially says that $\mathbb{E}(X|\mathcal{G})$ integrates the same as $X$ wherever it can be integrated. For our example where $\mathcal{G} = \{ \emptyset,A,B,\Omega\}\subset \mathcal{F}$, note that the function

\[X_\mathcal{G} = \dfrac{1}{\mathbb{P}(A)}\mathbb{E}(X 1_A)1_A+\dfrac{1}{\mathbb{P}(B)}\mathbb{E}(X 1_B)1_B\]

is a conditional expectation of $X$ with respect to $\mathcal{G}$:

$X_\mathcal{G}$ is clearly $\mathcal{G}$-measurable,
The integral condition can be easily checked for all elements of the sigma-algebra, for instance for $G=A$

\[\int_A X_\mathcal{G} d\mathbb{P} = \int_A \dfrac{1}{\mathbb{P}(A)}\mathbb{E}(X 1_A)1_A d\mathbb{P} = \int_A Xd\mathbb{P}.\]

The existence of such functions $\mathbb{E}(X|\mathcal{G})$ can be proved using Radon-Nikodym’s theorem, and uniqueness can be proved almost-surely using the definition.

We can think of $\mathbb{E}(X|\mathcal{G})$ as the best estimate of $X$ given the information that we have in $\mathcal{G}$.

Independence

We will have now a little detour to talk about information and independence. Suppose that $X$ is a random variable, and we want to perform measurements using such $X$. Our original sigma-algebra $\mathcal{F}$ contains all the information to be ever available to us. Normally, to measure $X$ we do not really need all such information. The information that we do need, is by definition, the information that defines all the events related to $X$ (tautological statement!). As $X$ is a random variable, that means that all such information is contained in a sub sigma-algebra of $\mathcal{F}$, which we previously called $\sigma(X)$. Now, suppose that we have a sub sigma-algebra $\mathcal{H}$ of $\mathcal{F}$ that we believe has nothing to do with $X$, or in other words, that it shares no information with $X$. We say that $\mathcal{H}$ and $X$ are independent if

\[\mathbb{P}(H\cap E) = \mathbb{P}(H)\mathbb{P}(E)\]

for all $H\in\mathcal{H}$ and $E\in\sigma(X)$. Of course this definition has nothing special about the sigma-algebra $\sigma(X)$ and can be extended to arbitrary sigma-algebras. We can think of them as having independent information.

Properties of $\mathbb{E}(\cdot|\mathcal{G})$

$\mathbb{E}(\cdot|\mathcal{G})$ is linear: $\mathbb{E}(aX+Y|\mathcal{G}) = a\mathbb{E}(X|\mathcal{G}) +\mathbb{E}(Y|\mathcal{G})$;
If $X\geq 0$ then $\mathbb{E}(X|\mathcal{G})\geq 0$;
If $\mathcal{G}$ is independent of $X$, then $\mathbb{E}(X|\mathcal{G}) = \mathbb{E}(X)$;
If $X$ is $\mathcal{G}$-measurable, then $\mathbb{E}(X|\mathcal{G}) = X$;
$\mathbb{E}(\mathbb{E}(X|\mathcal{G})) = \mathbb{E}(X)$;
If $\mathcal{H}\subset\mathcal{G}\subset\mathcal{F}$ are sigma-algebras, then $\mathbb{E}(\mathbb{E}(X|\mathcal{G})|\mathcal{H}) = \mathbb{E}(X|\mathcal{H})$.

We give a few words about each of these properties:

self-explainatory,
self-explainatory,
This property is saying that if our sigma-algebra $\mathcal{G}$ does not know anything about $X$ then our best estimate for $X$ given our available information is the same as if we did not have any information, that is, $\mathbb{E}(X)$;
If $\mathcal{G}$ contains all the information that defines $X$, then our best estimate of $X$ given $\mathcal{G}$ is precisely $X$;
If we look at the average of our simplified estimate of $X$, then we obtain our no-information estimate $\mathbb{E}(X)$;
If we give an estimate of $X$ with respect to a sigma-algebra, and then estimate that resulting function with respect to a smaller sigma-algebra, we obtain what we would have obtained if we estimated with respect to the smaller sigma-algebra from the start.

With this we finish to set up all the basic ingredients to start talking about martingales.

Share on

Twitter Facebook LinkedIn

Felipe Pérez

Martingales 0

Sigma-algebras and information

Conditional expectation

Independence

Properties of $\mathbb{E}(\cdot|\mathcal{G})$

Share on

Leave a Comment

You May Also Enjoy

Your SQL is probably WRONG

Is overfitting… good?

PCA and supervised learning

Extreme value theory III