Code

When I am not working on mathematics, I am most likely coding. On my github I have uploaded a number of projects I have worked on:

Machine Learning Library: Access repository here. This project consists of a full library of implementations of the most popular Machine Learning algorithms, as well as detailed mathematical explanations of how they work. This includes algorithms like linear, multiple and polynomial regression, neural networks, k-nearest neighbors, k-means, linear and quadratic discrminant analysis, naive Bayes and support vector machines.

Kafka application: Access repository here. This is a minimal application that mounts a Kafka cluster, a Postgres database, as well as running several services in Python and a Kafka Streams Java application, that compose a pipeline of data ingestion, from an API all the way to the aggregated records in Postgres. The idea is to produce a pipeline that is able to ingest, process and store data.

Kubernetes primer: Access repository here. As an attempt to properly learn to utilize the Kubernetes APIs, I am building a project where I make use of the essential components and resources of K8s with very minimal code implementations.

Image Recognition: Handwriting (Kaggle competition): Access repository here. This is a state of the art implementation of convolutional neural networks for image recognition. We use the famous dataset MNIST, on which we achieved a test accuracy of around 0.994. The implementation uses Tensorflow with Keras, and features techniques like data augmentation to increase the accuracy of the model.

NYC taxi fare prediction (Kaggle competition): Access repository here. In this project we analyzed a dataset consisting of information about taxi trips, with the idea of making a model which is able to predict the fare of a taxi trip given information like the start/end coordinates, date and time, and so on. We used ideas like feature engineering, univariate/bivariate analysis, linear regression and random forests. This work is joint with Alexis Moraga.

Sentiment analysis: Twitter: Access repository here. In this project, we made a sentiment analysis algorithm using a Twitter dataset consisting of tweets addressed to airlines accounts. These tweets are labeled with positive/negative/neutral sentiment according to the text written by the users. We constructed a model using techniques word2vec and LSTM cells for recurrent neural networks. This work is joint with Alexis Moraga.

Breast cancer detection: Access repository here. In this project we built a model for recognizing whether a breast tumor is malignant or not, based on metric features associated to it. We made a multiple layers model, using algorithms suck as k-nearest neighbor, neural networks, gradient-boosted decision trees, and random forests. In the upper layers, we used stacking ideas to get a combined prediction from the models on the base layers.

Convolution and filters: a detailed study: Access notebook here. In this project we go through the nitty gritty aspect of convolutions and how they can be used to extract information from images.

Spectral clustering

Concentric annuli dataset

Spectral clustering

Spectral information of the Laplacian of the graph associated to the dataset

Spectral clustering

Felipe Pérez

Code