Posts by Tags

Algorithms

Problem 3: rearrange of lists

1 minute read

Published:

The sequence [0, 1, ..., N] has been jumbled, and the only clue you have for its order is an array representing whether each number is larger or smaller than the last. Given this information, reconstruct an array that is consistent with it. For example, given [None, +, +, -, +], you could return [1, 2, 3, 0, 4].

Big data

Is overfitting… good?

44 minute read

Published:

Conventional wisdom in Data Science/Statistical Learning tells us that when we try to fit a model that is able to learn from our data and generalize what it learned to unseen data, we must keep in mind the bias/variance trade-off. This means that as we increase the complexity of our models (let us say, the number of learnable parameters), it is more likely that they will just memorize the data and will not be able to generalize well to unseen data. On the other hand, if we keep the complexity low, our models will not be able to learn too much from our data and will not do well either. We are told to find the sweet spot in the middle. But is this paradigm about to change? In this article we review new developments that suggest that this might be the case.

Problem 5: Unbiased and consistent estimators

10 minute read

Published:

What does it mean for an estimator to be unbiased? What about consistent? Give examples of an unbiased but not consistent estimator, as well as a biased but consistent estimator.

The Law of Anomalous Numbers

39 minute read

Published:

We have seen in previous entries how multiple statistical results allow us to find patterns in randomness. Today we talk about a strange law of numbers, Benford’s law. This is an empirical law that explains the distribution of the leading digit of observed data in real life situations. We will then draw a parallel with the law of leading digits for the powers of 2 and how can ergodic theory help us understand these phenomena.

Central limit theorem

Statistical laws in Dynamical Systems

51 minute read

Published:

In this entry I will attempt to introduce some of the fundamental concepts in my research. In particular, I want to discuss the topic of Limit laws in dynamical systems, and the endgoal is to explain our latests results together with Matthew Nicol (University of Houston), Large deviations and central limit theorems for sequential and random systems of intermittent maps. This is a huge topic, and as such I will divide the exposition in multiple articles. In this particular one, I will focus in Central limit theorems and Large deviations estimates for one dimensional maps of the unit interval. The exposition will use the formal notation of probability theory, but the ideas do not require a deep knowledge of the formalism behind probability theory. The code for all the graphics and functions can be found here.

Computer science

Problem 3: rearrange of lists

1 minute read

Published:

The sequence [0, 1, ..., N] has been jumbled, and the only clue you have for its order is an array representing whether each number is larger or smaller than the last. Given this information, reconstruct an array that is consistent with it. For example, given [None, +, +, -, +], you could return [1, 2, 3, 0, 4].

Confidence intervals

Confidence intervals

33 minute read

Published:

Confidence intervals represent one of the most powerful tools used by statisticians/data scientists. The allow us to quantify the uncertainty of our predictions, which proves crucial when making important decisions. In this entry we will take a first dive into this topic, finding confidence intervals for means of i.i.d. processes.

Cramer's theorem

Large deviations

12 minute read

Published:

In the previous entries we have studied the asymptotic behavior of the sums of iid random variables. The law of large numbers showed that almost surely, the averages converge to the mean, while the central limit theorem gave us a second order approximation, that is, it provided information of the size of the fluctuations around the expected value. In this entry, we discuss a third order result, namely, Cramer’s theorem on large deviations. Intuitively, Cramer’s theorem establishes asymptotic rates for the decay of the probability of observing very unlikely events.

Dynamical systems

The ergodic theorem

36 minute read

Published:

In previous entries (see here and here) we have seen how the weak and the strong law of large numbers gives us asymptotics for the averages of i.i.d. sequences. While the assumption of independence allows us to apply this result in many different contexts, it is still a quite strong assumption. The ergodic theorem constitues a generalization where we allow a certain degree of dependence between the random variables being considered.

Statistical laws in Dynamical Systems

51 minute read

Published:

In this entry I will attempt to introduce some of the fundamental concepts in my research. In particular, I want to discuss the topic of Limit laws in dynamical systems, and the endgoal is to explain our latests results together with Matthew Nicol (University of Houston), Large deviations and central limit theorems for sequential and random systems of intermittent maps. This is a huge topic, and as such I will divide the exposition in multiple articles. In this particular one, I will focus in Central limit theorems and Large deviations estimates for one dimensional maps of the unit interval. The exposition will use the formal notation of probability theory, but the ideas do not require a deep knowledge of the formalism behind probability theory. The code for all the graphics and functions can be found here.

Ergodic theorem

The ergodic theorem

36 minute read

Published:

In previous entries (see here and here) we have seen how the weak and the strong law of large numbers gives us asymptotics for the averages of i.i.d. sequences. While the assumption of independence allows us to apply this result in many different contexts, it is still a quite strong assumption. The ergodic theorem constitues a generalization where we allow a certain degree of dependence between the random variables being considered.

Ergodic theory

Problem 5: Unbiased and consistent estimators

10 minute read

Published:

What does it mean for an estimator to be unbiased? What about consistent? Give examples of an unbiased but not consistent estimator, as well as a biased but consistent estimator.

The Law of Anomalous Numbers

39 minute read

Published:

We have seen in previous entries how multiple statistical results allow us to find patterns in randomness. Today we talk about a strange law of numbers, Benford’s law. This is an empirical law that explains the distribution of the leading digit of observed data in real life situations. We will then draw a parallel with the law of leading digits for the powers of 2 and how can ergodic theory help us understand these phenomena.

Extreme value theory

Extreme value theory III

47 minute read

Published:

In previous entries (here and here we introduced and discussed the basic elements of Extreme Value Theory (EVT), such as the extreme value distributions, the generalized extreme value distribution, saw examples of such distribution, as well as simulated data and their corresponding fits. In this entry we get our hands on real data and see how we can make some inference using EVT. In particular, we focus on Maximum Likelihood methods for parameter estimation of a temperature dataset from my home city, Santiago de Chile.

Extreme value theory II

24 minute read

Published:

In a previous entry we introduced the basics of Extreme Value Theory (EVT), such as the degeneracy of the maxima distribution, the extremal types theorem, as well as the Gumbel, Frechet, Weibull and GEV distributions. In this entry we will see a few examples of random variables and their respective maxima distribution, both theoretical and by performing simulations.

Extreme value theory I

27 minute read

Published:

In this entry I will discuss some of the introductory concepts of Extreme Value Theory (EVT). This theory is concerned with the asymptotic behavior of the extremes events of a stochastic process, in particular, the distributional characteristics of the maximum order statistics, which will be the focus of this entry. We will first look at i.i.d. processes and then move on to processes with non-trivial dependence. The exposition is based on the book An Introduction to Statistical Modeling of Extreme Values by Stuart Coles. I will also try to use this as a way to showcase the different libraries of python that allow us to work with EVT.

Intermittent maps

Statistical laws in Dynamical Systems

51 minute read

Published:

In this entry I will attempt to introduce some of the fundamental concepts in my research. In particular, I want to discuss the topic of Limit laws in dynamical systems, and the endgoal is to explain our latests results together with Matthew Nicol (University of Houston), Large deviations and central limit theorems for sequential and random systems of intermittent maps. This is a huge topic, and as such I will divide the exposition in multiple articles. In this particular one, I will focus in Central limit theorems and Large deviations estimates for one dimensional maps of the unit interval. The exposition will use the formal notation of probability theory, but the ideas do not require a deep knowledge of the formalism behind probability theory. The code for all the graphics and functions can be found here.

Large deviations

Statistical laws in Dynamical Systems

51 minute read

Published:

In this entry I will attempt to introduce some of the fundamental concepts in my research. In particular, I want to discuss the topic of Limit laws in dynamical systems, and the endgoal is to explain our latests results together with Matthew Nicol (University of Houston), Large deviations and central limit theorems for sequential and random systems of intermittent maps. This is a huge topic, and as such I will divide the exposition in multiple articles. In this particular one, I will focus in Central limit theorems and Large deviations estimates for one dimensional maps of the unit interval. The exposition will use the formal notation of probability theory, but the ideas do not require a deep knowledge of the formalism behind probability theory. The code for all the graphics and functions can be found here.

Latex

Law of large numbers

The ergodic theorem

36 minute read

Published:

In previous entries (see here and here) we have seen how the weak and the strong law of large numbers gives us asymptotics for the averages of i.i.d. sequences. While the assumption of independence allows us to apply this result in many different contexts, it is still a quite strong assumption. The ergodic theorem constitues a generalization where we allow a certain degree of dependence between the random variables being considered.

Machine learning

Is overfitting… good?

44 minute read

Published:

Conventional wisdom in Data Science/Statistical Learning tells us that when we try to fit a model that is able to learn from our data and generalize what it learned to unseen data, we must keep in mind the bias/variance trade-off. This means that as we increase the complexity of our models (let us say, the number of learnable parameters), it is more likely that they will just memorize the data and will not be able to generalize well to unseen data. On the other hand, if we keep the complexity low, our models will not be able to learn too much from our data and will not do well either. We are told to find the sweet spot in the middle. But is this paradigm about to change? In this article we review new developments that suggest that this might be the case.

PCA and supervised learning

20 minute read

Published:

The situation is this: you have been given data, with several variables $x_1,\dots,x_d$ and a response $y$ that we want to predict using such variables. You perform some basic statistical analysis on your variables, see their averages, ranges, distribution. Then you look at the correlation between these variables, and find that there is some strong correlation between some of them. You decide to perform principal components analysis (PCA) to reduce the dimension of your features to $w_1,\dots,w_m$, with $m < d$. Now you fit your model, and you find that it gives terrible results, even though your PCA variables are capable of explaining most of the variance of the features. What went wrong?

Empirical error

20 minute read

Published:

In a previous entry we studied the concepts of bias and variance in an additive context. In this entry we dive deeper into the analysis of the mean squared error and how to asses it using actual data.

Problem 5: Unbiased and consistent estimators

10 minute read

Published:

What does it mean for an estimator to be unbiased? What about consistent? Give examples of an unbiased but not consistent estimator, as well as a biased but consistent estimator.

The Law of Anomalous Numbers

39 minute read

Published:

We have seen in previous entries how multiple statistical results allow us to find patterns in randomness. Today we talk about a strange law of numbers, Benford’s law. This is an empirical law that explains the distribution of the leading digit of observed data in real life situations. We will then draw a parallel with the law of leading digits for the powers of 2 and how can ergodic theory help us understand these phenomena.

Understanding bias and variance

35 minute read

Published:

In this notebook we will study the concepts of bias and variance and how can we use them to fit models to our datasets. We will base our exposition in both theory and examples.

Martingales

Martingales 0

37 minute read

Published:

I want to talk about martingales, but unfortunately in order to do that properly, we need to talk first about sigma-algebras and conditional expectations, subjects which can be a bit harsh at first. These concepts are essential, and while we could just work with them just as formal objects with certain properties, it is fundamental to have a deeper understanding of them so we do not get lost in formalism and we are able to capture the intuition behind this theory.

Mathjax

Measure theory

Is overfitting… good?

44 minute read

Published:

Conventional wisdom in Data Science/Statistical Learning tells us that when we try to fit a model that is able to learn from our data and generalize what it learned to unseen data, we must keep in mind the bias/variance trade-off. This means that as we increase the complexity of our models (let us say, the number of learnable parameters), it is more likely that they will just memorize the data and will not be able to generalize well to unseen data. On the other hand, if we keep the complexity low, our models will not be able to learn too much from our data and will not do well either. We are told to find the sweet spot in the middle. But is this paradigm about to change? In this article we review new developments that suggest that this might be the case.

Martingales 0

37 minute read

Published:

I want to talk about martingales, but unfortunately in order to do that properly, we need to talk first about sigma-algebras and conditional expectations, subjects which can be a bit harsh at first. These concepts are essential, and while we could just work with them just as formal objects with certain properties, it is fundamental to have a deeper understanding of them so we do not get lost in formalism and we are able to capture the intuition behind this theory.

Extreme value theory III

47 minute read

Published:

In previous entries (here and here we introduced and discussed the basic elements of Extreme Value Theory (EVT), such as the extreme value distributions, the generalized extreme value distribution, saw examples of such distribution, as well as simulated data and their corresponding fits. In this entry we get our hands on real data and see how we can make some inference using EVT. In particular, we focus on Maximum Likelihood methods for parameter estimation of a temperature dataset from my home city, Santiago de Chile.

Empirical error

20 minute read

Published:

In a previous entry we studied the concepts of bias and variance in an additive context. In this entry we dive deeper into the analysis of the mean squared error and how to asses it using actual data.

Extreme value theory II

24 minute read

Published:

In a previous entry we introduced the basics of Extreme Value Theory (EVT), such as the degeneracy of the maxima distribution, the extremal types theorem, as well as the Gumbel, Frechet, Weibull and GEV distributions. In this entry we will see a few examples of random variables and their respective maxima distribution, both theoretical and by performing simulations.

Confidence intervals

33 minute read

Published:

Confidence intervals represent one of the most powerful tools used by statisticians/data scientists. The allow us to quantify the uncertainty of our predictions, which proves crucial when making important decisions. In this entry we will take a first dive into this topic, finding confidence intervals for means of i.i.d. processes.

Problem 5: Unbiased and consistent estimators

10 minute read

Published:

What does it mean for an estimator to be unbiased? What about consistent? Give examples of an unbiased but not consistent estimator, as well as a biased but consistent estimator.

The Law of Anomalous Numbers

39 minute read

Published:

We have seen in previous entries how multiple statistical results allow us to find patterns in randomness. Today we talk about a strange law of numbers, Benford’s law. This is an empirical law that explains the distribution of the leading digit of observed data in real life situations. We will then draw a parallel with the law of leading digits for the powers of 2 and how can ergodic theory help us understand these phenomena.

The ergodic theorem

36 minute read

Published:

In previous entries (see here and here) we have seen how the weak and the strong law of large numbers gives us asymptotics for the averages of i.i.d. sequences. While the assumption of independence allows us to apply this result in many different contexts, it is still a quite strong assumption. The ergodic theorem constitues a generalization where we allow a certain degree of dependence between the random variables being considered.

Problem 4: Producing normal vectors

1 minute read

Published:

The idea is to produce multivariate normal random vectors from univariate standard normal random numbers. For this, let $Z$ be a $\mathcal{N}(0,I_n)$ a random vector (here $I_n$ denotes the $n\times n$ identity matrix). Given a vector $\mu$ and a symmetric real matrix $\Sigma$ (to be the mean and covariance parameters of the multivariate normal vector), consider the Cholesky decomposition of $\Sigma$, given by $\Sigma = LL^T$. Prove that $X = \mu +LZ$ is distributes as $\mathcal{N}(\mu,\Sigma)$.

Problem 2: expectation of minimum

less than 1 minute read

Published:

Say we have X ~ Uniform(0, 1) and Y ~ Uniform(0, 1). What is the expected value of the minimum of X and Y?

Problem 1: random unfair coin

less than 1 minute read

Published:

There is a fair coin (one side heads, one side tails) and an unfair coin (both sides tails). You pick one at random, flip it 5 times, and observe that it comes up as tails all five times. What is the chance that you are flipping the unfair coin?

Statistical laws in Dynamical Systems

51 minute read

Published:

In this entry I will attempt to introduce some of the fundamental concepts in my research. In particular, I want to discuss the topic of Limit laws in dynamical systems, and the endgoal is to explain our latests results together with Matthew Nicol (University of Houston), Large deviations and central limit theorems for sequential and random systems of intermittent maps. This is a huge topic, and as such I will divide the exposition in multiple articles. In this particular one, I will focus in Central limit theorems and Large deviations estimates for one dimensional maps of the unit interval. The exposition will use the formal notation of probability theory, but the ideas do not require a deep knowledge of the formalism behind probability theory. The code for all the graphics and functions can be found here.

Principal components analysis

PCA and supervised learning

20 minute read

Published:

The situation is this: you have been given data, with several variables $x_1,\dots,x_d$ and a response $y$ that we want to predict using such variables. You perform some basic statistical analysis on your variables, see their averages, ranges, distribution. Then you look at the correlation between these variables, and find that there is some strong correlation between some of them. You decide to perform principal components analysis (PCA) to reduce the dimension of your features to $w_1,\dots,w_m$, with $m < d$. Now you fit your model, and you find that it gives terrible results, even though your PCA variables are capable of explaining most of the variance of the features. What went wrong?

Probability

Understanding bias and variance

35 minute read

Published:

In this notebook we will study the concepts of bias and variance and how can we use them to fit models to our datasets. We will base our exposition in both theory and examples.

Probability theory

Is overfitting… good?

44 minute read

Published:

Conventional wisdom in Data Science/Statistical Learning tells us that when we try to fit a model that is able to learn from our data and generalize what it learned to unseen data, we must keep in mind the bias/variance trade-off. This means that as we increase the complexity of our models (let us say, the number of learnable parameters), it is more likely that they will just memorize the data and will not be able to generalize well to unseen data. On the other hand, if we keep the complexity low, our models will not be able to learn too much from our data and will not do well either. We are told to find the sweet spot in the middle. But is this paradigm about to change? In this article we review new developments that suggest that this might be the case.

PCA and supervised learning

20 minute read

Published:

The situation is this: you have been given data, with several variables $x_1,\dots,x_d$ and a response $y$ that we want to predict using such variables. You perform some basic statistical analysis on your variables, see their averages, ranges, distribution. Then you look at the correlation between these variables, and find that there is some strong correlation between some of them. You decide to perform principal components analysis (PCA) to reduce the dimension of your features to $w_1,\dots,w_m$, with $m < d$. Now you fit your model, and you find that it gives terrible results, even though your PCA variables are capable of explaining most of the variance of the features. What went wrong?

Martingales 0

37 minute read

Published:

I want to talk about martingales, but unfortunately in order to do that properly, we need to talk first about sigma-algebras and conditional expectations, subjects which can be a bit harsh at first. These concepts are essential, and while we could just work with them just as formal objects with certain properties, it is fundamental to have a deeper understanding of them so we do not get lost in formalism and we are able to capture the intuition behind this theory.

Extreme value theory III

47 minute read

Published:

In previous entries (here and here we introduced and discussed the basic elements of Extreme Value Theory (EVT), such as the extreme value distributions, the generalized extreme value distribution, saw examples of such distribution, as well as simulated data and their corresponding fits. In this entry we get our hands on real data and see how we can make some inference using EVT. In particular, we focus on Maximum Likelihood methods for parameter estimation of a temperature dataset from my home city, Santiago de Chile.

Empirical error

20 minute read

Published:

In a previous entry we studied the concepts of bias and variance in an additive context. In this entry we dive deeper into the analysis of the mean squared error and how to asses it using actual data.

Extreme value theory II

24 minute read

Published:

In a previous entry we introduced the basics of Extreme Value Theory (EVT), such as the degeneracy of the maxima distribution, the extremal types theorem, as well as the Gumbel, Frechet, Weibull and GEV distributions. In this entry we will see a few examples of random variables and their respective maxima distribution, both theoretical and by performing simulations.

Confidence intervals

33 minute read

Published:

Confidence intervals represent one of the most powerful tools used by statisticians/data scientists. The allow us to quantify the uncertainty of our predictions, which proves crucial when making important decisions. In this entry we will take a first dive into this topic, finding confidence intervals for means of i.i.d. processes.

Problem 5: Unbiased and consistent estimators

10 minute read

Published:

What does it mean for an estimator to be unbiased? What about consistent? Give examples of an unbiased but not consistent estimator, as well as a biased but consistent estimator.

The Law of Anomalous Numbers

39 minute read

Published:

We have seen in previous entries how multiple statistical results allow us to find patterns in randomness. Today we talk about a strange law of numbers, Benford’s law. This is an empirical law that explains the distribution of the leading digit of observed data in real life situations. We will then draw a parallel with the law of leading digits for the powers of 2 and how can ergodic theory help us understand these phenomena.

The ergodic theorem

36 minute read

Published:

In previous entries (see here and here) we have seen how the weak and the strong law of large numbers gives us asymptotics for the averages of i.i.d. sequences. While the assumption of independence allows us to apply this result in many different contexts, it is still a quite strong assumption. The ergodic theorem constitues a generalization where we allow a certain degree of dependence between the random variables being considered.

Problem 4: Producing normal vectors

1 minute read

Published:

The idea is to produce multivariate normal random vectors from univariate standard normal random numbers. For this, let $Z$ be a $\mathcal{N}(0,I_n)$ a random vector (here $I_n$ denotes the $n\times n$ identity matrix). Given a vector $\mu$ and a symmetric real matrix $\Sigma$ (to be the mean and covariance parameters of the multivariate normal vector), consider the Cholesky decomposition of $\Sigma$, given by $\Sigma = LL^T$. Prove that $X = \mu +LZ$ is distributes as $\mathcal{N}(\mu,\Sigma)$.

Extreme value theory I

27 minute read

Published:

In this entry I will discuss some of the introductory concepts of Extreme Value Theory (EVT). This theory is concerned with the asymptotic behavior of the extremes events of a stochastic process, in particular, the distributional characteristics of the maximum order statistics, which will be the focus of this entry. We will first look at i.i.d. processes and then move on to processes with non-trivial dependence. The exposition is based on the book An Introduction to Statistical Modeling of Extreme Values by Stuart Coles. I will also try to use this as a way to showcase the different libraries of python that allow us to work with EVT.

Problem 2: expectation of minimum

less than 1 minute read

Published:

Say we have X ~ Uniform(0, 1) and Y ~ Uniform(0, 1). What is the expected value of the minimum of X and Y?

Problem 1: random unfair coin

less than 1 minute read

Published:

There is a fair coin (one side heads, one side tails) and an unfair coin (both sides tails). You pick one at random, flip it 5 times, and observe that it comes up as tails all five times. What is the chance that you are flipping the unfair coin?

Statistical laws in Dynamical Systems

51 minute read

Published:

In this entry I will attempt to introduce some of the fundamental concepts in my research. In particular, I want to discuss the topic of Limit laws in dynamical systems, and the endgoal is to explain our latests results together with Matthew Nicol (University of Houston), Large deviations and central limit theorems for sequential and random systems of intermittent maps. This is a huge topic, and as such I will divide the exposition in multiple articles. In this particular one, I will focus in Central limit theorems and Large deviations estimates for one dimensional maps of the unit interval. The exposition will use the formal notation of probability theory, but the ideas do not require a deep knowledge of the formalism behind probability theory. The code for all the graphics and functions can be found here.

Statistical laws

Martingales 0

37 minute read

Published:

I want to talk about martingales, but unfortunately in order to do that properly, we need to talk first about sigma-algebras and conditional expectations, subjects which can be a bit harsh at first. These concepts are essential, and while we could just work with them just as formal objects with certain properties, it is fundamental to have a deeper understanding of them so we do not get lost in formalism and we are able to capture the intuition behind this theory.

Extreme value theory III

47 minute read

Published:

In previous entries (here and here we introduced and discussed the basic elements of Extreme Value Theory (EVT), such as the extreme value distributions, the generalized extreme value distribution, saw examples of such distribution, as well as simulated data and their corresponding fits. In this entry we get our hands on real data and see how we can make some inference using EVT. In particular, we focus on Maximum Likelihood methods for parameter estimation of a temperature dataset from my home city, Santiago de Chile.

Extreme value theory II

24 minute read

Published:

In a previous entry we introduced the basics of Extreme Value Theory (EVT), such as the degeneracy of the maxima distribution, the extremal types theorem, as well as the Gumbel, Frechet, Weibull and GEV distributions. In this entry we will see a few examples of random variables and their respective maxima distribution, both theoretical and by performing simulations.

The ergodic theorem

36 minute read

Published:

In previous entries (see here and here) we have seen how the weak and the strong law of large numbers gives us asymptotics for the averages of i.i.d. sequences. While the assumption of independence allows us to apply this result in many different contexts, it is still a quite strong assumption. The ergodic theorem constitues a generalization where we allow a certain degree of dependence between the random variables being considered.

Statistical laws in Dynamical Systems

51 minute read

Published:

In this entry I will attempt to introduce some of the fundamental concepts in my research. In particular, I want to discuss the topic of Limit laws in dynamical systems, and the endgoal is to explain our latests results together with Matthew Nicol (University of Houston), Large deviations and central limit theorems for sequential and random systems of intermittent maps. This is a huge topic, and as such I will divide the exposition in multiple articles. In this particular one, I will focus in Central limit theorems and Large deviations estimates for one dimensional maps of the unit interval. The exposition will use the formal notation of probability theory, but the ideas do not require a deep knowledge of the formalism behind probability theory. The code for all the graphics and functions can be found here.

Statistical learning

Is overfitting… good?

44 minute read

Published:

Conventional wisdom in Data Science/Statistical Learning tells us that when we try to fit a model that is able to learn from our data and generalize what it learned to unseen data, we must keep in mind the bias/variance trade-off. This means that as we increase the complexity of our models (let us say, the number of learnable parameters), it is more likely that they will just memorize the data and will not be able to generalize well to unseen data. On the other hand, if we keep the complexity low, our models will not be able to learn too much from our data and will not do well either. We are told to find the sweet spot in the middle. But is this paradigm about to change? In this article we review new developments that suggest that this might be the case.

Empirical error

20 minute read

Published:

In a previous entry we studied the concepts of bias and variance in an additive context. In this entry we dive deeper into the analysis of the mean squared error and how to asses it using actual data.

Understanding bias and variance

35 minute read

Published:

In this notebook we will study the concepts of bias and variance and how can we use them to fit models to our datasets. We will base our exposition in both theory and examples.

Statistics

Confidence intervals

33 minute read

Published:

Confidence intervals represent one of the most powerful tools used by statisticians/data scientists. The allow us to quantify the uncertainty of our predictions, which proves crucial when making important decisions. In this entry we will take a first dive into this topic, finding confidence intervals for means of i.i.d. processes.

Problem 4: Producing normal vectors

1 minute read

Published:

The idea is to produce multivariate normal random vectors from univariate standard normal random numbers. For this, let $Z$ be a $\mathcal{N}(0,I_n)$ a random vector (here $I_n$ denotes the $n\times n$ identity matrix). Given a vector $\mu$ and a symmetric real matrix $\Sigma$ (to be the mean and covariance parameters of the multivariate normal vector), consider the Cholesky decomposition of $\Sigma$, given by $\Sigma = LL^T$. Prove that $X = \mu +LZ$ is distributes as $\mathcal{N}(\mu,\Sigma)$.

Understanding bias and variance

35 minute read

Published:

In this notebook we will study the concepts of bias and variance and how can we use them to fit models to our datasets. We will base our exposition in both theory and examples.

Extreme value theory I

27 minute read

Published:

In this entry I will discuss some of the introductory concepts of Extreme Value Theory (EVT). This theory is concerned with the asymptotic behavior of the extremes events of a stochastic process, in particular, the distributional characteristics of the maximum order statistics, which will be the focus of this entry. We will first look at i.i.d. processes and then move on to processes with non-trivial dependence. The exposition is based on the book An Introduction to Statistical Modeling of Extreme Values by Stuart Coles. I will also try to use this as a way to showcase the different libraries of python that allow us to work with EVT.

Problem 2: expectation of minimum

less than 1 minute read

Published:

Say we have X ~ Uniform(0, 1) and Y ~ Uniform(0, 1). What is the expected value of the minimum of X and Y?

Supervised learning

PCA and supervised learning

20 minute read

Published:

The situation is this: you have been given data, with several variables $x_1,\dots,x_d$ and a response $y$ that we want to predict using such variables. You perform some basic statistical analysis on your variables, see their averages, ranges, distribution. Then you look at the correlation between these variables, and find that there is some strong correlation between some of them. You decide to perform principal components analysis (PCA) to reduce the dimension of your features to $w_1,\dots,w_m$, with $m < d$. Now you fit your model, and you find that it gives terrible results, even though your PCA variables are capable of explaining most of the variance of the features. What went wrong?

borel-cantelli

Law of large numbers

12 minute read

Published:

In the last entry, we discussed the Borel-Cantelli lemma, a zero-one law that for limpsups of sets. With this we can deduce a first law of large numbers under some assumptions for the higher moments of our random variables. Later on we will prove a version which does not assume the existence of the higher order moments.

Borel-Cantelli lemma

1 minute read

Published:

In this entry we will discuss the Borel-Cantelli lemma. Despite it being usually called just a lemma, it is without any doubts one of the most important and foundational results of probability theory: it is one of the essential zero-one laws, and it allows us to prove a variety of almost-sure results.

central limit theorem

Central limit theorem

19 minute read

Published:

In the previous entries, we have explored the behavior of the sums $S_n = X_1+\dots+X_n$ for an iid sequence ${ X_n }$. The weak and strong laws of large numbers show that asymptotically, we have $S_n \sim n\cdot \mathbb{E}(X_1)$ if $\mathbb{E}(X_1) \lt \infty$. In this entry, we explore the behavior of the fluctuations of the sums around their expected limit. More precisely, we prove the Central Limit Theorem (CLT):

characteristic function

Central limit theorem

19 minute read

Published:

In the previous entries, we have explored the behavior of the sums $S_n = X_1+\dots+X_n$ for an iid sequence ${ X_n }$. The weak and strong laws of large numbers show that asymptotically, we have $S_n \sim n\cdot \mathbb{E}(X_1)$ if $\mathbb{E}(X_1) \lt \infty$. In this entry, we explore the behavior of the fluctuations of the sums around their expected limit. More precisely, we prove the Central Limit Theorem (CLT):

concentration inequalities

Limit laws: weak law of large numbers

14 minute read

Published:

This is the first of a series of entries where we will explore several limit laws for sequences of random variables. The setting is going to be the standard for probability theory: fix a measure space $\Omega$, a sigma-algebra $\mathcal{B}$ and a probability measure $\mathbb{P}$. By random variables we mean measurable functions $X\colon \Omega \to \mathbb{R}$. The distribution of $X$ is defined as $F(x) = \mathbb{P}(\omega: X(\omega) \leq x )$, so $\mathbb{P}( a \leq X \leq b ) = F(b) - F(a)$. If the measure $X^{-1}\mathbb{P}(A) = \mathbb{P}(X^{-1}(A))$ is absolutely continuous with respect to $\mathbb{P}$, we denote the Radon-Nikodym derivative by $f = \frac{\mathrm{d} X^{-1}\mathbb{P}}{\mathrm{d} \mathbb{P}}$ and hence we can compute probabilities as integrals of this function with respect to the measure $\mathbb{P}$:

dynamical systems

A probabilistic approach to intermittency

20 minute read

Published:

In this entry, we will discuss some of Carlangerlo Liverani’s work. Without any doubt, his work Decay of Correlations, Annals of Mathematics, 142, pp. 239-301, (1995) is one of the most influential of his publications, but this time we will focus on his work with Sandro Vaienti and Benoit Saussol, A Probabilistic Approach to Intermittency, Ergodic Theory and Dynamical Systems, 19, pp. 671–685 (1999). We will refer to this paper as LSV99.

Statistical properties in hyperbolic dynamics, part 4

31 minute read

Published:

This is the fourth and last post of a series of 4 posts based on the lectures at the Houston Summer School on Dynamical Systems 2019 on Statistical properties in hyperbolic dynamics, given by Matthew Nicol, Andrew Török and William Ott. These notes are heavily edited with added comments, examples and explanations. Any mistake is of my responsibility. Some of the notation has been changed for consistency.

Statistical properties in hyperbolic dynamics

32 minute read

Published:

This is the first post of a series of 4 posts based on the lectures at the Houston Summer School on Dynamical Systems 2019 on Statistical properties in hyperbolic dynamics, given by Matthew Nicol, Andrew Török and William Ott. These notes are heavily edited with added comments, examples and explanations. Any mistake is of my responsibility. Some of the notation has been changed for consistency.

fluctuations

Central limit theorem

19 minute read

Published:

In the previous entries, we have explored the behavior of the sums $S_n = X_1+\dots+X_n$ for an iid sequence ${ X_n }$. The weak and strong laws of large numbers show that asymptotically, we have $S_n \sim n\cdot \mathbb{E}(X_1)$ if $\mathbb{E}(X_1) \lt \infty$. In this entry, we explore the behavior of the fluctuations of the sums around their expected limit. More precisely, we prove the Central Limit Theorem (CLT):

hello world

Second test

less than 1 minute read

Published:

Second entry to see how this goes.

Hello world

less than 1 minute read

Published:

This is a blog entry to test this feature.

hyperbolic dynamics

Statistical properties in hyperbolic dynamics, part 4

31 minute read

Published:

This is the fourth and last post of a series of 4 posts based on the lectures at the Houston Summer School on Dynamical Systems 2019 on Statistical properties in hyperbolic dynamics, given by Matthew Nicol, Andrew Török and William Ott. These notes are heavily edited with added comments, examples and explanations. Any mistake is of my responsibility. Some of the notation has been changed for consistency.

Statistical properties in hyperbolic dynamics

32 minute read

Published:

This is the first post of a series of 4 posts based on the lectures at the Houston Summer School on Dynamical Systems 2019 on Statistical properties in hyperbolic dynamics, given by Matthew Nicol, Andrew Török and William Ott. These notes are heavily edited with added comments, examples and explanations. Any mistake is of my responsibility. Some of the notation has been changed for consistency.

large deviations

Large deviations

12 minute read

Published:

In the previous entries we have studied the asymptotic behavior of the sums of iid random variables. The law of large numbers showed that almost surely, the averages converge to the mean, while the central limit theorem gave us a second order approximation, that is, it provided information of the size of the fluctuations around the expected value. In this entry, we discuss a third order result, namely, Cramer’s theorem on large deviations. Intuitively, Cramer’s theorem establishes asymptotic rates for the decay of the probability of observing very unlikely events.

law of large numbers

Law of large numbers

12 minute read

Published:

In the last entry, we discussed the Borel-Cantelli lemma, a zero-one law that for limpsups of sets. With this we can deduce a first law of large numbers under some assumptions for the higher moments of our random variables. Later on we will prove a version which does not assume the existence of the higher order moments.

Limit laws: weak law of large numbers

14 minute read

Published:

This is the first of a series of entries where we will explore several limit laws for sequences of random variables. The setting is going to be the standard for probability theory: fix a measure space $\Omega$, a sigma-algebra $\mathcal{B}$ and a probability measure $\mathbb{P}$. By random variables we mean measurable functions $X\colon \Omega \to \mathbb{R}$. The distribution of $X$ is defined as $F(x) = \mathbb{P}(\omega: X(\omega) \leq x )$, so $\mathbb{P}( a \leq X \leq b ) = F(b) - F(a)$. If the measure $X^{-1}\mathbb{P}(A) = \mathbb{P}(X^{-1}(A))$ is absolutely continuous with respect to $\mathbb{P}$, we denote the Radon-Nikodym derivative by $f = \frac{\mathrm{d} X^{-1}\mathbb{P}}{\mathrm{d} \mathbb{P}}$ and hence we can compute probabilities as integrals of this function with respect to the measure $\mathbb{P}$:

limit laws

A probabilistic approach to intermittency

20 minute read

Published:

In this entry, we will discuss some of Carlangerlo Liverani’s work. Without any doubt, his work Decay of Correlations, Annals of Mathematics, 142, pp. 239-301, (1995) is one of the most influential of his publications, but this time we will focus on his work with Sandro Vaienti and Benoit Saussol, A Probabilistic Approach to Intermittency, Ergodic Theory and Dynamical Systems, 19, pp. 671–685 (1999). We will refer to this paper as LSV99.

Large deviations

12 minute read

Published:

In the previous entries we have studied the asymptotic behavior of the sums of iid random variables. The law of large numbers showed that almost surely, the averages converge to the mean, while the central limit theorem gave us a second order approximation, that is, it provided information of the size of the fluctuations around the expected value. In this entry, we discuss a third order result, namely, Cramer’s theorem on large deviations. Intuitively, Cramer’s theorem establishes asymptotic rates for the decay of the probability of observing very unlikely events.

Central limit theorem

19 minute read

Published:

In the previous entries, we have explored the behavior of the sums $S_n = X_1+\dots+X_n$ for an iid sequence ${ X_n }$. The weak and strong laws of large numbers show that asymptotically, we have $S_n \sim n\cdot \mathbb{E}(X_1)$ if $\mathbb{E}(X_1) \lt \infty$. In this entry, we explore the behavior of the fluctuations of the sums around their expected limit. More precisely, we prove the Central Limit Theorem (CLT):

Law of large numbers

12 minute read

Published:

In the last entry, we discussed the Borel-Cantelli lemma, a zero-one law that for limpsups of sets. With this we can deduce a first law of large numbers under some assumptions for the higher moments of our random variables. Later on we will prove a version which does not assume the existence of the higher order moments.

Statistical properties in hyperbolic dynamics, part 4

31 minute read

Published:

This is the fourth and last post of a series of 4 posts based on the lectures at the Houston Summer School on Dynamical Systems 2019 on Statistical properties in hyperbolic dynamics, given by Matthew Nicol, Andrew Török and William Ott. These notes are heavily edited with added comments, examples and explanations. Any mistake is of my responsibility. Some of the notation has been changed for consistency.

Statistical properties in hyperbolic dynamics

32 minute read

Published:

This is the first post of a series of 4 posts based on the lectures at the Houston Summer School on Dynamical Systems 2019 on Statistical properties in hyperbolic dynamics, given by Matthew Nicol, Andrew Török and William Ott. These notes are heavily edited with added comments, examples and explanations. Any mistake is of my responsibility. Some of the notation has been changed for consistency.

Borel-Cantelli lemma

1 minute read

Published:

In this entry we will discuss the Borel-Cantelli lemma. Despite it being usually called just a lemma, it is without any doubts one of the most important and foundational results of probability theory: it is one of the essential zero-one laws, and it allows us to prove a variety of almost-sure results.

Limit laws: weak law of large numbers

14 minute read

Published:

This is the first of a series of entries where we will explore several limit laws for sequences of random variables. The setting is going to be the standard for probability theory: fix a measure space $\Omega$, a sigma-algebra $\mathcal{B}$ and a probability measure $\mathbb{P}$. By random variables we mean measurable functions $X\colon \Omega \to \mathbb{R}$. The distribution of $X$ is defined as $F(x) = \mathbb{P}(\omega: X(\omega) \leq x )$, so $\mathbb{P}( a \leq X \leq b ) = F(b) - F(a)$. If the measure $X^{-1}\mathbb{P}(A) = \mathbb{P}(X^{-1}(A))$ is absolutely continuous with respect to $\mathbb{P}$, we denote the Radon-Nikodym derivative by $f = \frac{\mathrm{d} X^{-1}\mathbb{P}}{\mathrm{d} \mathbb{P}}$ and hence we can compute probabilities as integrals of this function with respect to the measure $\mathbb{P}$:

measure theory

A probabilistic approach to intermittency

20 minute read

Published:

In this entry, we will discuss some of Carlangerlo Liverani’s work. Without any doubt, his work Decay of Correlations, Annals of Mathematics, 142, pp. 239-301, (1995) is one of the most influential of his publications, but this time we will focus on his work with Sandro Vaienti and Benoit Saussol, A Probabilistic Approach to Intermittency, Ergodic Theory and Dynamical Systems, 19, pp. 671–685 (1999). We will refer to this paper as LSV99.

Large deviations

12 minute read

Published:

In the previous entries we have studied the asymptotic behavior of the sums of iid random variables. The law of large numbers showed that almost surely, the averages converge to the mean, while the central limit theorem gave us a second order approximation, that is, it provided information of the size of the fluctuations around the expected value. In this entry, we discuss a third order result, namely, Cramer’s theorem on large deviations. Intuitively, Cramer’s theorem establishes asymptotic rates for the decay of the probability of observing very unlikely events.

Central limit theorem

19 minute read

Published:

In the previous entries, we have explored the behavior of the sums $S_n = X_1+\dots+X_n$ for an iid sequence ${ X_n }$. The weak and strong laws of large numbers show that asymptotically, we have $S_n \sim n\cdot \mathbb{E}(X_1)$ if $\mathbb{E}(X_1) \lt \infty$. In this entry, we explore the behavior of the fluctuations of the sums around their expected limit. More precisely, we prove the Central Limit Theorem (CLT):

Law of large numbers

12 minute read

Published:

In the last entry, we discussed the Borel-Cantelli lemma, a zero-one law that for limpsups of sets. With this we can deduce a first law of large numbers under some assumptions for the higher moments of our random variables. Later on we will prove a version which does not assume the existence of the higher order moments.

Statistical properties in hyperbolic dynamics, part 4

31 minute read

Published:

This is the fourth and last post of a series of 4 posts based on the lectures at the Houston Summer School on Dynamical Systems 2019 on Statistical properties in hyperbolic dynamics, given by Matthew Nicol, Andrew Török and William Ott. These notes are heavily edited with added comments, examples and explanations. Any mistake is of my responsibility. Some of the notation has been changed for consistency.

Statistical properties in hyperbolic dynamics

32 minute read

Published:

This is the first post of a series of 4 posts based on the lectures at the Houston Summer School on Dynamical Systems 2019 on Statistical properties in hyperbolic dynamics, given by Matthew Nicol, Andrew Török and William Ott. These notes are heavily edited with added comments, examples and explanations. Any mistake is of my responsibility. Some of the notation has been changed for consistency.

Borel-Cantelli lemma

1 minute read

Published:

In this entry we will discuss the Borel-Cantelli lemma. Despite it being usually called just a lemma, it is without any doubts one of the most important and foundational results of probability theory: it is one of the essential zero-one laws, and it allows us to prove a variety of almost-sure results.

Limit laws: weak law of large numbers

14 minute read

Published:

This is the first of a series of entries where we will explore several limit laws for sequences of random variables. The setting is going to be the standard for probability theory: fix a measure space $\Omega$, a sigma-algebra $\mathcal{B}$ and a probability measure $\mathbb{P}$. By random variables we mean measurable functions $X\colon \Omega \to \mathbb{R}$. The distribution of $X$ is defined as $F(x) = \mathbb{P}(\omega: X(\omega) \leq x )$, so $\mathbb{P}( a \leq X \leq b ) = F(b) - F(a)$. If the measure $X^{-1}\mathbb{P}(A) = \mathbb{P}(X^{-1}(A))$ is absolutely continuous with respect to $\mathbb{P}$, we denote the Radon-Nikodym derivative by $f = \frac{\mathrm{d} X^{-1}\mathbb{P}}{\mathrm{d} \mathbb{P}}$ and hence we can compute probabilities as integrals of this function with respect to the measure $\mathbb{P}$:

probability theory

A probabilistic approach to intermittency

20 minute read

Published:

In this entry, we will discuss some of Carlangerlo Liverani’s work. Without any doubt, his work Decay of Correlations, Annals of Mathematics, 142, pp. 239-301, (1995) is one of the most influential of his publications, but this time we will focus on his work with Sandro Vaienti and Benoit Saussol, A Probabilistic Approach to Intermittency, Ergodic Theory and Dynamical Systems, 19, pp. 671–685 (1999). We will refer to this paper as LSV99.

Large deviations

12 minute read

Published:

In the previous entries we have studied the asymptotic behavior of the sums of iid random variables. The law of large numbers showed that almost surely, the averages converge to the mean, while the central limit theorem gave us a second order approximation, that is, it provided information of the size of the fluctuations around the expected value. In this entry, we discuss a third order result, namely, Cramer’s theorem on large deviations. Intuitively, Cramer’s theorem establishes asymptotic rates for the decay of the probability of observing very unlikely events.

Central limit theorem

19 minute read

Published:

In the previous entries, we have explored the behavior of the sums $S_n = X_1+\dots+X_n$ for an iid sequence ${ X_n }$. The weak and strong laws of large numbers show that asymptotically, we have $S_n \sim n\cdot \mathbb{E}(X_1)$ if $\mathbb{E}(X_1) \lt \infty$. In this entry, we explore the behavior of the fluctuations of the sums around their expected limit. More precisely, we prove the Central Limit Theorem (CLT):

Law of large numbers

12 minute read

Published:

In the last entry, we discussed the Borel-Cantelli lemma, a zero-one law that for limpsups of sets. With this we can deduce a first law of large numbers under some assumptions for the higher moments of our random variables. Later on we will prove a version which does not assume the existence of the higher order moments.

Statistical properties in hyperbolic dynamics, part 4

31 minute read

Published:

This is the fourth and last post of a series of 4 posts based on the lectures at the Houston Summer School on Dynamical Systems 2019 on Statistical properties in hyperbolic dynamics, given by Matthew Nicol, Andrew Török and William Ott. These notes are heavily edited with added comments, examples and explanations. Any mistake is of my responsibility. Some of the notation has been changed for consistency.

Statistical properties in hyperbolic dynamics

32 minute read

Published:

This is the first post of a series of 4 posts based on the lectures at the Houston Summer School on Dynamical Systems 2019 on Statistical properties in hyperbolic dynamics, given by Matthew Nicol, Andrew Török and William Ott. These notes are heavily edited with added comments, examples and explanations. Any mistake is of my responsibility. Some of the notation has been changed for consistency.

Borel-Cantelli lemma

1 minute read

Published:

In this entry we will discuss the Borel-Cantelli lemma. Despite it being usually called just a lemma, it is without any doubts one of the most important and foundational results of probability theory: it is one of the essential zero-one laws, and it allows us to prove a variety of almost-sure results.

Limit laws: weak law of large numbers

14 minute read

Published:

This is the first of a series of entries where we will explore several limit laws for sequences of random variables. The setting is going to be the standard for probability theory: fix a measure space $\Omega$, a sigma-algebra $\mathcal{B}$ and a probability measure $\mathbb{P}$. By random variables we mean measurable functions $X\colon \Omega \to \mathbb{R}$. The distribution of $X$ is defined as $F(x) = \mathbb{P}(\omega: X(\omega) \leq x )$, so $\mathbb{P}( a \leq X \leq b ) = F(b) - F(a)$. If the measure $X^{-1}\mathbb{P}(A) = \mathbb{P}(X^{-1}(A))$ is absolutely continuous with respect to $\mathbb{P}$, we denote the Radon-Nikodym derivative by $f = \frac{\mathrm{d} X^{-1}\mathbb{P}}{\mathrm{d} \mathbb{P}}$ and hence we can compute probabilities as integrals of this function with respect to the measure $\mathbb{P}$:

transfer operator

Statistical properties in hyperbolic dynamics, part 4

31 minute read

Published:

This is the fourth and last post of a series of 4 posts based on the lectures at the Houston Summer School on Dynamical Systems 2019 on Statistical properties in hyperbolic dynamics, given by Matthew Nicol, Andrew Török and William Ott. These notes are heavily edited with added comments, examples and explanations. Any mistake is of my responsibility. Some of the notation has been changed for consistency.

Statistical properties in hyperbolic dynamics

32 minute read

Published:

This is the first post of a series of 4 posts based on the lectures at the Houston Summer School on Dynamical Systems 2019 on Statistical properties in hyperbolic dynamics, given by Matthew Nicol, Andrew Török and William Ott. These notes are heavily edited with added comments, examples and explanations. Any mistake is of my responsibility. Some of the notation has been changed for consistency.