Hello world

Your SQL is probably WRONG

22 minute read

Published: September 26, 2023

Yes, the title is a bit click-baity, but I hope that this catches your attention for a big warning of a very subtle behavior of most SQL dialects. The main idea is that when you perform a left (or right) join that has clauses on the right (left) hand side table, it is very easy to mess up the integrity of the join and end up with results that are very different to what you are expecting. This behavior is very easy to miss and many people fall into this trap without even noticing, as it is basically a silent failure.

Is overfitting… good?

44 minute read

Published: March 06, 2020

Conventional wisdom in Data Science/Statistical Learning tells us that when we try to fit a model that is able to learn from our data and generalize what it learned to unseen data, we must keep in mind the bias/variance trade-off. This means that as we increase the complexity of our models (let us say, the number of learnable parameters), it is more likely that they will just memorize the data and will not be able to generalize well to unseen data. On the other hand, if we keep the complexity low, our models will not be able to learn too much from our data and will not do well either. We are told to find the sweet spot in the middle. But is this paradigm about to change? In this article we review new developments that suggest that this might be the case.

PCA and supervised learning

21 minute read

Published: February 28, 2020

The situation is this: you have been given data, with several variables $x_1,\dots,x_d$ and a response $y$ that we want to predict using such variables. You perform some basic statistical analysis on your variables, see their averages, ranges, distribution. Then you look at the correlation between these variables, and find that there is some strong correlation between some of them. You decide to perform principal components analysis (PCA) to reduce the dimension of your features to $w_1,\dots,w_m$, with $m < d$. Now you fit your model, and you find that it gives terrible results, even though your PCA variables are capable of explaining most of the variance of the features. What went wrong?

Martingales 0

37 minute read

Published: February 20, 2020

I want to talk about martingales, but unfortunately in order to do that properly, we need to talk first about sigma-algebras and conditional expectations, subjects which can be a bit harsh at first. These concepts are essential, and while we could just work with them just as formal objects with certain properties, it is fundamental to have a deeper understanding of them so we do not get lost in formalism and we are able to capture the intuition behind this theory.

Felipe Pérez

Hello world

Share on

Leave a Comment

You May Also Enjoy

Your SQL is probably WRONG

Is overfitting… good?

PCA and supervised learning

Martingales 0