Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

Posts

Is overfitting… good?

44 minute read

Published:

Conventional wisdom in Data Science/Statistical Learning tells us that when we try to fit a model that is able to learn from our data and generalize what it learned to unseen data, we must keep in mind the bias/variance trade-off. This means that as we increase the complexity of our models (let us say, the number of learnable parameters), it is more likely that they will just memorize the data and will not be able to generalize well to unseen data. On the other hand, if we keep the complexity low, our models will not be able to learn too much from our data and will not do well either. We are told to find the sweet spot in the middle. But is this paradigm about to change? In this article we review new developments that suggest that this might be the case.

PCA and supervised learning

20 minute read

Published:

The situation is this: you have been given data, with several variables $x_1,\dots,x_d$ and a response $y$ that we want to predict using such variables. You perform some basic statistical analysis on your variables, see their averages, ranges, distribution. Then you look at the correlation between these variables, and find that there is some strong correlation between some of them. You decide to perform principal components analysis (PCA) to reduce the dimension of your features to $w_1,\dots,w_m$, with $m < d$. Now you fit your model, and you find that it gives terrible results, even though your PCA variables are capable of explaining most of the variance of the features. What went wrong?

Martingales 0

37 minute read

Published:

I want to talk about martingales, but unfortunately in order to do that properly, we need to talk first about sigma-algebras and conditional expectations, subjects which can be a bit harsh at first. These concepts are essential, and while we could just work with them just as formal objects with certain properties, it is fundamental to have a deeper understanding of them so we do not get lost in formalism and we are able to capture the intuition behind this theory.

Extreme value theory III

47 minute read

Published:

In previous entries (here and here we introduced and discussed the basic elements of Extreme Value Theory (EVT), such as the extreme value distributions, the generalized extreme value distribution, saw examples of such distribution, as well as simulated data and their corresponding fits. In this entry we get our hands on real data and see how we can make some inference using EVT. In particular, we focus on Maximum Likelihood methods for parameter estimation of a temperature dataset from my home city, Santiago de Chile.

Empirical error

20 minute read

Published:

In a previous entry we studied the concepts of bias and variance in an additive context. In this entry we dive deeper into the analysis of the mean squared error and how to asses it using actual data.

Extreme value theory II

24 minute read

Published:

In a previous entry we introduced the basics of Extreme Value Theory (EVT), such as the degeneracy of the maxima distribution, the extremal types theorem, as well as the Gumbel, Frechet, Weibull and GEV distributions. In this entry we will see a few examples of random variables and their respective maxima distribution, both theoretical and by performing simulations.

Confidence intervals

33 minute read

Published:

Confidence intervals represent one of the most powerful tools used by statisticians/data scientists. The allow us to quantify the uncertainty of our predictions, which proves crucial when making important decisions. In this entry we will take a first dive into this topic, finding confidence intervals for means of i.i.d. processes.

Problem 5: Unbiased and consistent estimators

10 minute read

Published:

What does it mean for an estimator to be unbiased? What about consistent? Give examples of an unbiased but not consistent estimator, as well as a biased but consistent estimator.

The Law of Anomalous Numbers

39 minute read

Published:

We have seen in previous entries how multiple statistical results allow us to find patterns in randomness. Today we talk about a strange law of numbers, Benford’s law. This is an empirical law that explains the distribution of the leading digit of observed data in real life situations. We will then draw a parallel with the law of leading digits for the powers of 2 and how can ergodic theory help us understand these phenomena.

The ergodic theorem

36 minute read

Published:

In previous entries (see here and here) we have seen how the weak and the strong law of large numbers gives us asymptotics for the averages of i.i.d. sequences. While the assumption of independence allows us to apply this result in many different contexts, it is still a quite strong assumption. The ergodic theorem constitues a generalization where we allow a certain degree of dependence between the random variables being considered.

Problem 4: Producing normal vectors

1 minute read

Published:

The idea is to produce multivariate normal random vectors from univariate standard normal random numbers. For this, let $Z$ be a $\mathcal{N}(0,I_n)$ a random vector (here $I_n$ denotes the $n\times n$ identity matrix). Given a vector $\mu$ and a symmetric real matrix $\Sigma$ (to be the mean and covariance parameters of the multivariate normal vector), consider the Cholesky decomposition of $\Sigma$, given by $\Sigma = LL^T$. Prove that $X = \mu +LZ$ is distributes as $\mathcal{N}(\mu,\Sigma)$.

Understanding bias and variance

35 minute read

Published:

In this notebook we will study the concepts of bias and variance and how can we use them to fit models to our datasets. We will base our exposition in both theory and examples.

Problem 3: rearrange of lists

1 minute read

Published:

The sequence [0, 1, ..., N] has been jumbled, and the only clue you have for its order is an array representing whether each number is larger or smaller than the last. Given this information, reconstruct an array that is consistent with it. For example, given [None, +, +, -, +], you could return [1, 2, 3, 0, 4].

Extreme value theory I

27 minute read

Published:

In this entry I will discuss some of the introductory concepts of Extreme Value Theory (EVT). This theory is concerned with the asymptotic behavior of the extremes events of a stochastic process, in particular, the distributional characteristics of the maximum order statistics, which will be the focus of this entry. We will first look at i.i.d. processes and then move on to processes with non-trivial dependence. The exposition is based on the book An Introduction to Statistical Modeling of Extreme Values by Stuart Coles. I will also try to use this as a way to showcase the different libraries of python that allow us to work with EVT.

Problem 2: expectation of minimum

less than 1 minute read

Published:

Say we have X ~ Uniform(0, 1) and Y ~ Uniform(0, 1). What is the expected value of the minimum of X and Y?

Problem 1: random unfair coin

less than 1 minute read

Published:

There is a fair coin (one side heads, one side tails) and an unfair coin (both sides tails). You pick one at random, flip it 5 times, and observe that it comes up as tails all five times. What is the chance that you are flipping the unfair coin?

Statistical laws in Dynamical Systems

51 minute read

Published:

In this entry I will attempt to introduce some of the fundamental concepts in my research. In particular, I want to discuss the topic of Limit laws in dynamical systems, and the endgoal is to explain our latests results together with Matthew Nicol (University of Houston), Large deviations and central limit theorems for sequential and random systems of intermittent maps. This is a huge topic, and as such I will divide the exposition in multiple articles. In this particular one, I will focus in Central limit theorems and Large deviations estimates for one dimensional maps of the unit interval. The exposition will use the formal notation of probability theory, but the ideas do not require a deep knowledge of the formalism behind probability theory. The code for all the graphics and functions can be found here.

A probabilistic approach to intermittency

20 minute read

Published:

In this entry, we will discuss some of Carlangerlo Liverani’s work. Without any doubt, his work Decay of Correlations, Annals of Mathematics, 142, pp. 239-301, (1995) is one of the most influential of his publications, but this time we will focus on his work with Sandro Vaienti and Benoit Saussol, A Probabilistic Approach to Intermittency, Ergodic Theory and Dynamical Systems, 19, pp. 671–685 (1999). We will refer to this paper as LSV99.

Large deviations

12 minute read

Published:

In the previous entries we have studied the asymptotic behavior of the sums of iid random variables. The law of large numbers showed that almost surely, the averages converge to the mean, while the central limit theorem gave us a second order approximation, that is, it provided information of the size of the fluctuations around the expected value. In this entry, we discuss a third order result, namely, Cramer’s theorem on large deviations. Intuitively, Cramer’s theorem establishes asymptotic rates for the decay of the probability of observing very unlikely events.

Central limit theorem

19 minute read

Published:

In the previous entries, we have explored the behavior of the sums $S_n = X_1+\dots+X_n$ for an iid sequence ${ X_n }$. The weak and strong laws of large numbers show that asymptotically, we have $S_n \sim n\cdot \mathbb{E}(X_1)$ if $\mathbb{E}(X_1) \lt \infty$. In this entry, we explore the behavior of the fluctuations of the sums around their expected limit. More precisely, we prove the Central Limit Theorem (CLT):

Law of large numbers

12 minute read

Published:

In the last entry, we discussed the Borel-Cantelli lemma, a zero-one law that for limpsups of sets. With this we can deduce a first law of large numbers under some assumptions for the higher moments of our random variables. Later on we will prove a version which does not assume the existence of the higher order moments.

Statistical properties in hyperbolic dynamics, part 4

31 minute read

Published:

This is the fourth and last post of a series of 4 posts based on the lectures at the Houston Summer School on Dynamical Systems 2019 on Statistical properties in hyperbolic dynamics, given by Matthew Nicol, Andrew Török and William Ott. These notes are heavily edited with added comments, examples and explanations. Any mistake is of my responsibility. Some of the notation has been changed for consistency.

Statistical properties in hyperbolic dynamics

32 minute read

Published:

This is the first post of a series of 4 posts based on the lectures at the Houston Summer School on Dynamical Systems 2019 on Statistical properties in hyperbolic dynamics, given by Matthew Nicol, Andrew Török and William Ott. These notes are heavily edited with added comments, examples and explanations. Any mistake is of my responsibility. Some of the notation has been changed for consistency.

Borel-Cantelli lemma

1 minute read

Published:

In this entry we will discuss the Borel-Cantelli lemma. Despite it being usually called just a lemma, it is without any doubts one of the most important and foundational results of probability theory: it is one of the essential zero-one laws, and it allows us to prove a variety of almost-sure results.

Limit laws: weak law of large numbers

14 minute read

Published:

This is the first of a series of entries where we will explore several limit laws for sequences of random variables. The setting is going to be the standard for probability theory: fix a measure space $\Omega$, a sigma-algebra $\mathcal{B}$ and a probability measure $\mathbb{P}$. By random variables we mean measurable functions $X\colon \Omega \to \mathbb{R}$. The distribution of $X$ is defined as $F(x) = \mathbb{P}(\omega: X(\omega) \leq x )$, so $\mathbb{P}( a \leq X \leq b ) = F(b) - F(a)$. If the measure $X^{-1}\mathbb{P}(A) = \mathbb{P}(X^{-1}(A))$ is absolutely continuous with respect to $\mathbb{P}$, we denote the Radon-Nikodym derivative by $f = \frac{\mathrm{d} X^{-1}\mathbb{P}}{\mathrm{d} \mathbb{P}}$ and hence we can compute probabilities as integrals of this function with respect to the measure $\mathbb{P}$:

Second test

less than 1 minute read

Published:

Second entry to see how this goes.

Hello world

less than 1 minute read

Published:

This is a blog entry to test this feature.

contact

links

machine

notes

research

teaching