## Page Not Found

Page not found. Your pixels are in another canvas.

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Page not found. Your pixels are in another canvas.

Home

This is a page not in th emain menu

** Published:**

Conventional wisdom in Data Science/Statistical Learning tells us that when we try to fit a model that is able to learn from our data and generalize what it learned to unseen data, we must keep in mind the bias/variance trade-off. This means that as we increase the complexity of our models (let us say, the number of learnable parameters), it is more likely that they will just *memorize* the data and will not be able to generalize well to unseen data. On the other hand, if we keep the complexity low, our models will not be able to *learn* too much from our data and will not do well either. We are told to find the *sweet spot* in the middle. But is this paradigm about to change? In this article we review new developments that suggest that this might be the case.

** Published:**

The situation is this: you have been given data, with several variables $x_1,\dots,x_d$ and a response $y$ that we want to predict using such variables. You perform some basic statistical analysis on your variables, see their averages, ranges, distribution. Then you look at the correlation between these variables, and find that there is some strong correlation between some of them. You decide to perform principal components analysis (PCA) to reduce the dimension of your features to $w_1,\dots,w_m$, with $m < d$. Now you fit your model, and you find that it gives terrible results, even though your PCA variables are capable of explaining most of the variance of the features. What went wrong?

** Published:**

I want to talk about martingales, but unfortunately in order to do that properly, we need to talk first about sigma-algebras and conditional expectations, subjects which can be a bit harsh at first. These concepts are essential, and while we could just work with them just as formal objects with certain properties, it is fundamental to have a deeper understanding of them so we do not get lost in formalism and we are able to capture the intuition behind this theory.

** Published:**

In previous entries (here and here we introduced and discussed the basic elements of Extreme Value Theory (EVT), such as the extreme value distributions, the generalized extreme value distribution, saw examples of such distribution, as well as simulated data and their corresponding fits. In this entry we get our hands on real data and see how we can make some inference using EVT. In particular, we focus on Maximum Likelihood methods for parameter estimation of a temperature dataset from my home city, Santiago de Chile.

** Published:**

In a previous entry we studied the concepts of bias and variance in an additive context. In this entry we dive deeper into the analysis of the mean squared error and how to asses it using actual data.

** Published:**

In a previous entry we introduced the basics of Extreme Value Theory (EVT), such as the degeneracy of the maxima distribution, the extremal types theorem, as well as the Gumbel, Frechet, Weibull and GEV distributions. In this entry we will see a few examples of random variables and their respective maxima distribution, both theoretical and by performing simulations.

** Published:**

Confidence intervals represent one of the most powerful tools used by statisticians/data scientists. The allow us to quantify the uncertainty of our predictions, which proves crucial when making important decisions. In this entry we will take a first dive into this topic, finding confidence intervals for means of i.i.d. processes.

** Published:**

What does it mean for an estimator to be unbiased? What about consistent? Give examples of an unbiased but not consistent estimator, as well as a biased but consistent estimator.

** Published:**

We have seen in previous entries how multiple statistical results allow us to find patterns in randomness. Today we talk about a *strange* law of numbers, **Benford’s law**. This is an empirical law that explains the distribution of the **leading digit** of observed data in real life situations. We will then draw a parallel with the **law of leading digits for the powers of 2** and how can ergodic theory help us understand these phenomena.

** Published:**

In previous entries (see here and here) we have seen how the weak and the strong law of large numbers gives us asymptotics for the averages of i.i.d. sequences. While the assumption of independence allows us to apply this result in many different contexts, it is still a quite strong assumption. The ergodic theorem constitues a generalization where we allow a certain degree of dependence between the random variables being considered.

** Published:**

The idea is to produce multivariate normal random vectors from univariate standard normal random numbers. For this, let $Z$ be a $\mathcal{N}(0,I_n)$ a random vector (here $I_n$ denotes the $n\times n$ identity matrix). Given a vector $\mu$ and a symmetric real matrix $\Sigma$ (to be the mean and covariance parameters of the multivariate normal vector), consider the Cholesky decomposition of $\Sigma$, given by $\Sigma = LL^T$. Prove that $X = \mu +LZ$ is distributes as $\mathcal{N}(\mu,\Sigma)$.

** Published:**

In this notebook we will study the concepts of bias and variance and how can we use them to fit models to our datasets. We will base our exposition in both theory and examples.

** Published:**

The sequence `[0, 1, ..., N]`

has been jumbled, and the only clue you have for its order is an array representing whether each number is larger or smaller than the last. Given this information, reconstruct an array that is consistent with it. For example, given `[None, +, +, -, +]`

, you could return `[1, 2, 3, 0, 4]`

.

** Published:**

In this entry I will discuss some of the introductory concepts of Extreme Value Theory (EVT). This theory is concerned with the asymptotic behavior of the extremes events of a stochastic process, in particular, the distributional characteristics of the **maximum order statistics**, which will be the focus of this entry. We will first look at i.i.d. processes and then move on to processes with non-trivial dependence. The exposition is based on the book *An Introduction to Statistical Modeling of Extreme Values* by Stuart Coles. I will also try to use this as a way to showcase the different libraries of python that allow us to work with EVT.

** Published:**

Say we have X ~ Uniform(0, 1) and Y ~ Uniform(0, 1). What is the expected value of the minimum of X and Y?

** Published:**

There is a fair coin (one side heads, one side tails) and an unfair coin (both sides tails). You pick one at random, flip it 5 times, and observe that it comes up as tails all five times. What is the chance that you are flipping the unfair coin?

** Published:**

In this entry I will attempt to introduce some of the fundamental concepts in my research. In particular, I want to discuss the topic of **Limit laws in dynamical systems**, and the endgoal is to explain our latests results together with Matthew Nicol (University of Houston), Large deviations and central limit theorems for sequential and random systems of intermittent maps. This is a huge topic, and as such I will divide the exposition in multiple articles. In this particular one, I will focus in **Central limit theorems** and **Large deviations estimates** for one dimensional maps of the unit interval. The exposition will use the formal notation of probability theory, but the ideas do not require a deep knowledge of the formalism behind probability theory. The code for all the graphics and functions can be found here.

** Published:**

In this entry, we will discuss some of Carlangerlo Liverani’s work. Without any doubt, his work *Decay of Correlations, Annals of Mathematics, 142, pp. 239-301, (1995)* is one of the most influential of his publications, but this time we will focus on his work with Sandro Vaienti and Benoit Saussol, *A Probabilistic Approach to Intermittency, Ergodic Theory and Dynamical Systems, 19, pp. 671–685 (1999)*. We will refer to this paper as LSV99.

** Published:**

In the previous entries we have studied the asymptotic behavior of the sums of iid random variables. The law of large numbers showed that almost surely, the averages converge to the mean, while the central limit theorem gave us a second order approximation, that is, it provided information of the size of the fluctuations around the expected value. In this entry, we discuss a third order result, namely, Cramer’s theorem on large deviations. Intuitively, Cramer’s theorem establishes asymptotic rates for the decay of the probability of observing very unlikely events.

** Published:**

In the previous entries, we have explored the behavior of the sums $S_n = X_1+\dots+X_n$ for an iid sequence ${ X_n }$. The weak and strong laws of large numbers show that asymptotically, we have $S_n \sim n\cdot \mathbb{E}(X_1)$ if $\mathbb{E}(X_1) \lt \infty$. In this entry, we explore the behavior of the *fluctuations* of the sums around their expected limit. More precisely, we prove the Central Limit Theorem (CLT):

** Published:**

In this entry we discuss a stronger version of the Borel-Cantelli lemma. Recall the second Borel-Cantelli:

** Published:**

In the last entry, we discussed the Borel-Cantelli lemma, a zero-one law that for limpsups of sets. With this we can deduce a first law of large numbers under some assumptions for the higher moments of our random variables. Later on we will prove a version which does not assume the existence of the higher order moments.

** Published:**

This is the fourth and last post of a series of 4 posts based on the lectures at the Houston Summer School on Dynamical Systems 2019 on Statistical properties in hyperbolic dynamics, given by Matthew Nicol, Andrew Török and William Ott. These notes are heavily edited with added comments, examples and explanations. Any mistake is of my responsibility. Some of the notation has been changed for consistency.

** Published:**

This is the third post of a series of 4 posts based on the lectures at the Houston Summer School on Dynamical Systems 2019 on Statistical properties in hyperbolic dynamics, given by Matthew Nicol, Andrew Török and William Ott. These notes are heavily edited with added comments, examples and explanations. Any mistake is of my responsibility. Some of the notation has been changed for consistency.

** Published:**

Here I will compile some known issues that the interaction between Mathjax and markdown.

** Published:**

This is the second post of a series of 4 posts based on the lectures at the Houston Summer School on Dynamical Systems 2019 on Statistical properties in hyperbolic dynamics, given by Matthew Nicol, Andrew Török and William Ott. These notes are heavily edited with added comments, examples and explanations. Any mistake is of my responsibility. Some of the notation has been changed for consistency.

** Published:**

This is the first post of a series of 4 posts based on the lectures at the Houston Summer School on Dynamical Systems 2019 on Statistical properties in hyperbolic dynamics, given by Matthew Nicol, Andrew Török and William Ott. These notes are heavily edited with added comments, examples and explanations. Any mistake is of my responsibility. Some of the notation has been changed for consistency.

** Published:**

In this entry we will discuss the Borel-Cantelli lemma. Despite it being usually called *just a lemma*, it is without any doubts one of the most important and foundational results of probability theory: it is one of the essential zero-one laws, and it allows us to prove a variety of almost-sure results.

** Published:**

This is the first of a series of entries where we will explore several limit laws for sequences of random variables. The setting is going to be the standard for probability theory: fix a measure space $\Omega$, a sigma-algebra $\mathcal{B}$ and a probability measure $\mathbb{P}$. By random variables we mean measurable functions $X\colon \Omega \to \mathbb{R}$. The distribution of $X$ is defined as $F(x) = \mathbb{P}(\omega: X(\omega) \leq x )$, so $\mathbb{P}( a \leq X \leq b ) = F(b) - F(a)$. If the measure $X^{-1}\mathbb{P}(A) = \mathbb{P}(X^{-1}(A))$ is absolutely continuous with respect to $\mathbb{P}$, we denote the Radon-Nikodym derivative by $f = \frac{\mathrm{d} X^{-1}\mathbb{P}}{\mathrm{d} \mathbb{P}}$ and hence we can compute probabilities as integrals of this function with respect to the measure $\mathbb{P}$:

** Published:**

Second entry to see how this goes.

** Published:**

This is a blog entry to test this feature.