Using Python, learn how to use the Sportsreference API to obtain data for NCAAB analysis

Image for post
Image for post
Photo by Ben Hershey on Unsplash


A little elbow grease goes a long way. This blog will use the Sportsreference API to create a data set for your own analysis, specifically NCAA Men’s Basketball. When I look over the sports analytics landscape, I predominantly see canned reports (PaaS companies) hiding their data and not allowing you to export the raw data. The raw data is important so that you can check to see if the formulas are correct, and it allows you the flexibility to create your own features. This is understandable because, typically, these companies or websites most likely pay for that data. However, after…

This blog will teach you how to pull the necessary data and create a model that forecasts the price of the S&P 500!

Image for post
Image for post
Photo by Markus Spiske on Unsplash


If you are reading this, it’s most likely because you love to solve puzzles. I’m a very competitive person by nature. The Mt. Everest of puzzles, in my opinion, is trying to find excess returns through active trading in the stock market. This blog is my first of many posts of an attempt to — hopefully — summit the intimidating Mt. Everest of algorithmic trading and emerge profitably.

First off, I must atone for my sins. I am a failed daytrader. I attempted day trading over a summer during a hiatus from work. Everything I read told me to have…

Burnt out on learning intricate machine learning concepts and complicated jargon? Reignite your passion by building a simple image classifier to detect pneumonia in an x-ray!

Image for post
Image for post
Photo by Vadim Sadovski on Unsplash


If you are beginning to feel burnt out on learning a subject it is often beneficial to take a step out of the weeds. I like to build something fun and easy to regain positive momentum on my learning journey. This is one project that helped get my creative juices flowing again. Load my notebook from my GitHub repository into Google Colab and upload the Kaggle data set to learn how to build an image classifier using the fastai software! Don’t forget to set the hardware accelerator to GPU!

P.S. Directly upload notebook to Colab by going to File →Upload…

A gentle introduction to Business Intelligence and the types of data you will encounter

Image for post
Image for post
Photo By: Vladislav Babienko via Unsplash

Datafication refers to the measurement and storage of everything in our lives. Every day more ordinary appliances become “smart” furthering the growth of the Internet of Things (IoT) universe. Don’t believe me? Take a look at the newest smart refrigerator and think about the amount of data that it could collect from you. Thanks to more affordable smart devices, the rate at which we generate information is increasing by the day. It has never been more critical that a business is data-literate to navigate today’s fast-paced business climate.

Business Intelligence (BI) is a blanket term that encompasses generating, measuring, storing…

This lesser-known metric can help you better evaluate how models perform on imbalanced data

A lot of the most intriguing — to me — use cases for classifications are to identify outliers. The outlier may be a spam message in your inbox, a diagnosis of an extremely rare disease, or an equity portfolio with extraordinary returns. Due to these instances being outliers, it is hard to gather enough data to train a model on how to spot them. Some people dedicate their entire careers to creating strategies to combat imbalanced data. I’ll table those strategies for another blog another day.

Everyone has a strong intuition of what accuracy and error are. This is the…

Overview of the differences in 3 common regularization techniques — Ridge, Lasso, and Elastic Net.

Image for post
Image for post

You have trained a regression model and your R² comes back and it looks good — almost like it is too good. Of course, the next logical step is to see how the test data set fares. Spoiler alert: it will not have near the success of the training set.

This is a quite common phenomenon referred to as “overfitting”. Overfitting has a polar opposite called underfitting. In technical terms, overfitting means the model you built has more parameters than the data can justify. Click this link to view the GitHub repository containing the notebook for this blog post.


A guide to understanding what the limitations of an Ordinary Least Squares regression model are using Python

Image for post
Image for post

If you wish to view this with more context please check out my jupyter notebooks in my github repository

The Assumptions

— 1. A linear relationship is assumed between the dependent variable and the independent variables.
— 2. Regression residuals must be normally distributed and the mean be 0
— 3. The residuals are homoscedastic and approximately rectangular-shaped.
— 4. Absence of Multicollinearity is expected in the model, meaning that independent variables are not too highly correlated.
— 5. No Autocorrelation of the residuals


Ordinary Least Squares (OLS) regressions are also often just called regressions. It is important to note that there…

Blake Samaha

I am a data junkie working to kick my addiction to MS Excel with Python.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store