A little elbow grease goes a long way. This blog will use the Sportsreference API to create a data set for your own analysis, specifically NCAA Men’s Basketball. When I look over the sports analytics landscape, I predominantly see canned reports (PaaS companies) hiding their data and not allowing you to export the raw data. The raw data is important so that you can check to see if the formulas are correct, and it allows you the flexibility to create your own features. This is understandable because, typically, these companies or websites most likely pay for that data. However, after…
If you are reading this, it’s most likely because you love to solve puzzles. I’m a very competitive person by nature. The Mt. Everest of puzzles, in my opinion, is trying to find excess returns through active trading in the stock market. This blog is my first of many posts of an attempt to — hopefully — summit the intimidating Mt. Everest of algorithmic trading and emerge profitably.
First off, I must atone for my sins. I am a failed daytrader. I attempted day trading over a summer during a hiatus from work. Everything I read told me to have…
If you are beginning to feel burnt out on learning a subject it is often beneficial to take a step out of the weeds. I like to build something fun and easy to regain positive momentum on my learning journey. This is one project that helped get my creative juices flowing again. Load my notebook from my GitHub repository into Google Colab and upload the Kaggle data set to learn how to build an image classifier using the fastai software! Don’t forget to set the hardware accelerator to GPU!
P.S. Directly upload notebook to Colab by going to File →Upload…
Datafication refers to the measurement and storage of everything in our lives. Every day more ordinary appliances become “smart” furthering the growth of the Internet of Things (IoT) universe. Don’t believe me? Take a look at the newest smart refrigerator and think about the amount of data that it could collect from you. Thanks to more affordable smart devices, the rate at which we generate information is increasing by the day. It has never been more critical that a business is data-literate to navigate today’s fast-paced business climate.
Business Intelligence (BI) is a blanket term that encompasses generating, measuring, storing…
A lot of the most intriguing — to me — use cases for classifications are to identify outliers. The outlier may be a spam message in your inbox, a diagnosis of an extremely rare disease, or an equity portfolio with extraordinary returns. Due to these instances being outliers, it is hard to gather enough data to train a model on how to spot them. Some people dedicate their entire careers to creating strategies to combat imbalanced data. I’ll table those strategies for another blog another day.
Everyone has a strong intuition of what accuracy and error are. This is the…
You have trained a regression model and your R² comes back and it looks good — almost like it is too good. Of course, the next logical step is to see how the test data set fares. Spoiler alert: it will not have near the success of the training set.
This is a quite common phenomenon referred to as “overfitting”. Overfitting has a polar opposite called underfitting. In technical terms, overfitting means the model you built has more parameters than the data can justify. Click this link to view the GitHub repository containing the notebook for this blog post.
Working…
If you wish to view this with more context please check out my jupyter notebooks in my github repository
— 1. A linear relationship is assumed between the dependent variable and the independent variables.
— 2. Regression residuals must be normally distributed and the mean be 0
— 3. The residuals are homoscedastic and approximately rectangular-shaped.
— 4. Absence of Multicollinearity is expected in the model, meaning that independent variables are not too highly correlated.
— 5. No Autocorrelation of the residuals
Ordinary Least Squares (OLS) regressions are also often just called regressions. It is important to note that there…
I am a data junkie working to kick my addiction to MS Excel with Python.