Python

Full Stack Web App: Stocks Dashboard

Posted on September 3, 2023

Building and deploying a full-stack web application with a Python backend, JavaScript frontend, and containerized cloud deployment. [Read More]

api fastapi async cloud container docker javascript python backend frontend deployment

A multi container ML app (1/3): Docker

Posted on August 11, 2022

Building a translation app by putting together 3 containerized microservices: a Flask frontend, a FastAPI backend and a MySQL database. Let’s skim through the development process and the containerization. Also covered: Docker registry and CI/CD with GitHub Actions. [Read More]

docker container api nlp database flask fastapi python sql ci/cd registry

An asymmetric loss for regression models

Posted on January 9, 2022

Drive regression models towards under/overestimation while keeping accurate outputs with the linear-exponential loss. [Read More]

loss custom asymmetric underestimation overestimation regression python

Interpretable machine learning with SHAP

Posted on January 24, 2021

In this post, we predict health insurance costs with an efficient black box model, namely random forest. Then we interpret individual predictions as well as the global behavior of the estimator using SHapley Additive exPlanations. [Read More]

interpretability explainability Shapley SHAP correlation multicollinearity python black box insurance

Image recognition with PyTorch and fastai

Posted on December 22, 2020

Computer vision is one of the most fascinating domains in Machine Learning. Libraries like PyTorch and more recently, fastai, have made these kinds of models extraordinarily accessible. In this post, we build an aircraft classifier from gathering data to training and deployment. [Read More]

computer vision transfer learning pre-trained models deployment pytorch torchvision fastai fast.ai python

Gradient tree boosting in the cloud

Posted on November 13, 2020

A cloud computing experiment with two slightly different implementations of gradient boosted trees LightGBM and XGBoost. Let us evaluate how these two algorithms do on a moderately large dataset, regarding both accuracy and speed. [Read More]

python xgboost lightgbm gradient boosted trees cloud computing machine learning superconductors paperspace

Adding totals and subtotals rows with pandas or the tidyverse

Posted on October 18, 2020

When dealing with a dataframe, generating aggregate data is a very common task. In my experience, presenting the summary statistics for the whole population or for subgroups directly in the dataframe can be useful, if not necessary. Today, I present my recipe to achieve this with the pandas and tidyverse packages. [Read More]

python r pandas tidyverse total row subtotals aggregate groupby

Back to basics: Scaling train and test samples.

Posted on October 12, 2020

Splitting and scaling a dataset seems easy. Well, it is admittedly not that hard, however it can be tricky. Today we will see how to properly split and scale a dataset, as this step if often necessary before any ML wizardry. Let us do this with a few R & Python packages/modules. [Read More]

scaling normalize standardize spark pyspark python r dplyr caret

Weighted Random Forest with Spark 3

Posted on September 6, 2020

The third version of the number one distributed computing framework Spark was released in June 2020. Sample weights support was implemented for tree-based algorithms: decision tree, gradient tree boosting and random forest. Today we experiment with this new feature on an imbalanced dataset about credit card fraud. [Read More]

spark pyspark python weight fraud random forest

Outlier detection

Posted on August 31, 2020

In this post, I try to define what an outlier is and I present several ways to approach the problem of anomaly detection. Then, I present the Local Outlier Factor algorithm and apply it on a specific dataset to show its power, using both Python and R. I also compare its performance with the Isolation Forest method. [Read More]

outlier r python anomaly isolation lof