Do you want to improve your Machine Learning skills, and create a kick ass project portfolio?
Well, good news!
Rather than bombard you with hundreds of random projects that cover the exact same principles, I’ve put together 10 ML projects (from Kaggle and other sources) that I think you should work on if you want to up your game, and have a stand out resume.
And better still? If you’re stuck for time and can only do a few of these, I’ve listed 3 projects on this list that you can’t miss, if you want to be ahead of the curve.
So let’s dive in…
The first of my ‘can’t miss’ projects.
The goal is simple. You build a machine learning model to classify 120 different dog breeds and then you can even take it and try it on your own dog. (Because who doesn’t love dogs!?).
This project is a fun way to get started with computer vision models (aka convolutional neural networks or CNN’s), and teaches you multiple skills, such as:
You can check out the project here.
Sidenote: This is such a valuable project to build and learn from, that I actually use this as a follow along project inside of my Machine Learning and Data Science Bootcamp course!
You can even check out a preview of the course here for free.
The 2nd of my ‘can’t miss projects’ to complete.
Deploying a model is just as important as training a model, because by deploying a model you're making it available to people other than yourself.
This means they can also try it and see where it has errors or goes wrong, which helps you to further improve the model later on.
Crowdsourcing improvements for the win!
To run this project, you’ll be using a tool called Gradio:
Gradio is a Python framework for creating machine learning interfaces where people can upload their own data and try out your model.
You can grab the project code here, and I’ll walk you through the process.
The 3rd and final of my ‘can’t miss’ projects.
OpenAI has a great transcription model called Whisper.
There’s a lot of podcasts out there, and many of them don’t have transcriptions. This sucks, because having a podcast in text-form enables someone to search the text and jump around in the audio to find the most relevant parts to them.
This project will help you learn how to:
If you have a favorite podcast, you could even make a Python app that transcribes the podcast for you and turns the audio into searchable text.
Editor’s note: This is a great way to get on the radar of particular influencers, if you want to build relationships for mentoring, advice, help getting hired, etc. Transcribe a few episodes, reach out, and boom! New best friend.
You can check out the course and project here.
So those are the top 3 projects that I recommend to help you get a broad understanding and practice with Machine Learning.
If you work on those projects alone, you’ll pick up some vital skills, while also having some decent portfolio work to add to your resume and impress prospective employers.
However, if you want to go a little deeper and really make your resume stand out, then here are a few extra projects that you can try...
Now, this might seem like a pretty grim project, but it's actually something you might do if you’re building models for insurance companies or similar type industries.
Why? Well, insurance companies base their premiums (how much they make you pay) mainly on:
Negative event = insurance company will have to pay some money to you. This means they don't make a profit. They don't want this 😉.
Of course, the likelihood of a negative event (ex: injury or death) will be dependent on the type of activities someone will be doing (ex: driving a car vs. driving a motorcycle or relaxing at an all-inclusive resort vs. skydiving).
So these machine learning models can be used to run different scenarios and make these predictions more accurate. The more accurate the prediction, the more competitive the insurance company can be with pricing their premiums which means they make more money!
As you can guess, being able to create prediction models that fit this is a valuable skill for companies. So this makes it a perfect project to put on your resume and to talk about in your interviews.
In this project, you’ll be running a model on the 1912 Titanic disaster. If you don’t know the history, during its maiden voyage, the British passenger liner RMS Titanic sank after colliding with an iceberg in the North Atlantic Ocean.
Possibly due to negligence or poor planning, there were not enough lifeboats on the ship to cover each person, which resulted in the death of 1,502 out of 2,224 passengers present on the ship 🙁.
This machine learning project focuses on building a model that can predict the survival probability of passengers on the Titanic, based on factors like age, name, economic class, etc.
This project helps you in getting a good understanding of classification problems. You can find the sample dataset from our friends over at Kaggle here.
Linear regression is a core aspect of machine learning algorithms, and every machine learning engineer or enthusiast is expected to have a thorough understanding of how to use it.
However, if you’ve not come across this topic before, linear regression is applied to predict continuous variables from a set of features.
Why care and what does that mean? Well, by understanding how to use linear regression, it can help us to resolve a range of ML problems, such as predicting housing prices.
This project will help you understand ML basics such as data manipulation.
You can get the sample dataset to use here.
MNIST (otherwise known as the Modified National Institute of Standards and Technology), is the most basic dataset for computer vision.
It was introduced in the year 1999 and has since served as the foundation for benchmarking classification algorithms.
Similar to the first doggo project from earlier in this list, but a little simpler. In this project, you are going to train a Machine Learning algorithm to accurately recognize handwritten images of digits ranging from 0 to 9.
ML skills you’ll learn include:
Check the sample dataset here.
Audio has proven to be fairly difficult for ML algorithms to learn. However, we can help it get better!
How?
By categorizing music based on how it sounds, we can develop a model for identifying audio recordings into different musical genres, such as pop, rock, romance, and so forth.
Get the sample dataset here.
Data science and statistical analysis became widely known in the general public, partly because of the story behind the ‘Moneyball’ movie.
Spoilers for a 12 year old film, but it's the tale of how an underdog baseball team with very little money or success, became the league champions, as well as achieving a record breaking 20 consecutive wins!
How did they achieve this? Sweet, sexy, statistics baby! Instead of trying to compete by purchasing the most expensive players who hit sensational home runs, they instead performed statistical analysis to find the consistent performers amongst the cheaper players.
The stats were calculated using parameters such as on-base percentage (OBP) and slugging percentage (SLG), which are very important when scoring runs but are often undervalued by most scouts and old school baseball executives.
This way, they hoped to build a team that would win, based on the math in their model vs. outlier conditions. More runs per game = more wins over time.
The goal of this project is to help you build a similar winning team. You’ll be using machine learning to extract insights from historical data for baseball, football, and basketball.
You sports fans out there will love this one. Maybe this can be your first step to working for a sports team!
Grab the sample dataset here.
Kind of similar to the housing prices project from earlier, but applied slightly differently.
The goal of this project is to develop a regression model to forecast sales of each product at a supermarket in the upcoming year. This will then help in identifying sales trends and implementing practical business tactics to generate revenue - a key skill to have for almost every e-commerce or physical product company.
(Otherwise you might not order enough bananas… 😱)
You can check the sample dataset here.
Credit card fraud cost over an estimated $30 billion dollars in the US alone back in 2021, so as you can guess, being able to detect this and stop it from happening is fairly important.
This project aims to build a fraud detection model for credit card fraud, by looking for systemic anomalies to avoid scams and unauthenticated transactions.
You can check out the sample dataset here.
So there you have it. My top 10 recommend beginner to advanced Machine Learning projects for you to work on, up your game, and make your resume pop.
Remember though, if you’re stuck for time then I recommend you work on the Top 3 projects first as these can have some of the biggest benefits for your skill development, while also covering a lot of what you need to know and practice.
If you’re just starting out or find yourself stuck on any of these projects, then be sure to check out my Complete Machine Learning and Data Science Bootcamp.
The best part? You get to learn alonside 1,000s of fellow students in our private Discord community, as well as direct access to me, so you can ask any questions and learn faster and easier than ever before!