31st issue! If you missed them, you can read the previous issues of the Machine Learning Monthly newsletter here.
Hey everyone!
Daniel here, Iβm a machine learning engineer who teaches the following beginner-friendly machine learning courses:
I also write regularly about machine learning and on my own blog as well as make videos on the topic on YouTube.
Enough about me!
Typically Machine Learning Monthly is a 500ish (+/-1000ish, usually +) word post detailing some of the most interesting things on machine learning I've found in the last month.
But this month is a special edition of Machine Learning Monthly!
To go alongside the recent launch of the Zero to Mastery PyTorch course, the contents of this edition are all to do with PyTorch.
What you missed in July as a Machine Learning Engineerβ¦
My work π
The Zero to Mastery PyTorch course is live!
Easily one of the most requested courses has officially been published on Zero To Mastery. Inside, weβll learn and practice writing PyTorch code, the most popular deep learning framework in a hands-on and beginner-friendly way.
You can view all of the course resources on GitHub, read them online at learnpytorch.io or try out the first half of the course in a sinlge 25-hour long (yes, 25-hours) video on YouTube:
The YouTube video above contains sections 00-04 of the Learn PyTorch for Deep Learning course.
To continue on with the lectures for sections 05-09, you can join ZTM to access the full course.
Special Edition: PyTorch Extra Resources
Welcome to this special edition of Machine Learning Monthly!
As mentioned, to go alongside the recent launch of the Zero to Mastery PyTorch course, the contents of this edition are all to do with PyTorch (I may have also snuck in a few broader ML resources as well).
Despite the full PyTorch course being over 40 hours, youβll likely finish the course being excited to learn even more.
After all, the course is a PyTorch momentum builder.
The following resources are collected to extend the course.
A warning though: thereβs a lot here.
Best to choose 1 or 2 resources from each section (or less) to explore more. And put the rest in your bag for later.
There is no best resource either.
If theyβve made it on this list, you can consider them a quality resource.
Most are PyTorch-specific, fitting extensions to the course but a couple are non PyTorch-specific, however, theyβre still valuable in the world of machine learning.
π₯Β Pure PyTorch resources
- PyTorch blog β Stay up to date on the latest on PyTorch right from the source. I check the blog once a month or so for updates.
- PyTorch documentation β Weβll have explored this plenty throughout the course but thereβs still a large amount we havenβt touched. No trouble, explore often and when necessary.
- PyTorch Performance Tuning Guide β One of the first things youβll likely want to do after the course is to make your PyTorch models faster (training and inference), the PyTorch Performance Tuning Guide helps you do just that.
- PyTorch Recipes β PyTorch recipes is a collection of small tutorials to showcase common PyTorch features and workflows you may want to create, such as Loading Data in PyTorch and Saving and Loading models for Inference in PyTorch.
- Setting up PyTorch in VSCode β VSCode is one of the most popular IDEs out there. And its PyTorch support is getting better and better. Throughout the Zero to Mastery PyTorch course, we use Google Colab because of its ease of use. But chances are youβll be developing in an IDE like VSCode soon.
πΒ Libraries that make pure PyTorch better/add features
The course focuses on pure PyTorch (using minimal external libraries) because if you know how to write plain PyTorch, you can learn to use the various extension libraries.
- fast.ai β fastai is an open-source library that takes care of many of the boring parts of building neural networks and makes creating state-of-the-art models possible with a few lines of code. Their free library, course and documentation are all world-class.
- MosaicML for more efficient model training β The faster you can train models, the faster you can figure out what works and what doesnβt. MosaicMLβs open-source
Composer
library helps you train neural networks with PyTorch faster by implementing speedup algorithms behind the scenes which means you can get better results out of your existing PyTorch models faster. All of their code is open-source and their docs are fantastic.
- PyTorch Lightning for reducing boilerplate β PyTorch Lightning takes care of many of the steps that you often have to do by hand in vanilla PyTorch, such as writing a training and test loop, model checkpointing, logging and more. PyTorch Lightning builds on top of PyTorch to allow you to make PyTorch models with less code.
Libraries that extend/make pure PyTorch better.
πΒ Books for PyTorch
Textbooks to learn more about PyTorch as well as deep learning in general.
πΒ Resources for Machine Learning and Deep Learning Engineering
Machine Learning Engineering (also referred to as MLOps or ML operations) is the practice of getting the models you create into the hands of others. This may mean via a public app or working behind the scenes to make business decisions.
The following resources will help you learn more about the steps around deploying a machine learning model.
- Designing Machine Learning Systems book by Chip Huyen β If you want to build an ML system, itβd be good to know how others have done it. Chipβs book focuses less on building a single machine learning model (though thereβs plenty of content on that in the book) but rather building a cohesive ML system. It covers everything from data engineering to model building to model deployment (online and offline) to model monitoring. Even better, itβs a joy to read, you can tell the book is written by a writer (Chip has previously authored several books).
- Made With ML by Goku Mohandas β Whenever I want to learn or reference something to do with MLOps, I go to madewithml.com/mlops and see if thereβs a lesson on it. Made with ML not only teaches you the fundamentals of many different ML models but goes through how to build an end-to-end ML system with plenty of code and tooling examples.
- The Machine Learning Engineering book by Andriy Burkov β Even though this book is available to read online for free, I bought it as soon as it came out. Iβve used it as a reference and to learn more about ML engineering so much itβs basically always on my desk/within arms reach. Burkov does an excellent job at getting to the point and referencing further materials when necessary.
- Full Stack Deep Learning course β I first did this course in 2021. And itβs continued to evolve to cover the latest and greatest tools in the field. Itβll teach you how to plan a project to solve an ML problem, how to source or create data, how to troubleshoot an ML project when it goes wrong and most of all, how to build ML-powered products.
Resources to improve your machine learning engineering skills (all of the steps that go around building a machine learning model).
πΒ Where to find datasets
Machine learning projects begin with data.
No data, no ML.
The following resources are some of the best for finding open-source and often ready-to-use datasets on a wide range of topics and problem domains.
- Paperswithcode Datasets β Search for the most used and common machine learning benchmark datasets, understand what they contain, where they came from and where they can be found. You can often also see the current best-performing model on each dataset.
- HuggingFace Datasets β Not just a resource to find datasets across a wide range of problem domains but also a library to download and start using them within a few lines of code.
- Kaggle Datasets β Find all kinds of datasets that usually accompany Kaggle Competitions, many of which come straight out of industry.
- Google Dataset search β Just like searching Google but specifically for datasets.
These should be plenty to get started, however, for your own specific problems youβll likely want to build your own dataset.
Places to find existing and open-source datasets for a variety of problem spaces.
The following resources are focused on libraries and pretrained models for specific problem domains such as computer vision and recommendation engines/systems.
πΒ Computer Vision
We cover computer vision in 03. PyTorch Computer Vision but as a quick recap, computer vision is the art of getting computers to see.
If your data is visual, images, x-rays, production line video or even hand-written documents, it may be a computer vision problem.
- TorchVision β PyTorchβs resident computer vision library. Find plenty of methods for loading vision data as well as plenty of pretrained computer vision models to use for your own problems.
- timm (Torch Image Models) library β One of the most comprehensive computer vision libraries and resources for pretrained computer vision models. Almost all new research in that uses PyTorch for computer vision leverages the
timm
library in some way.
- Yolov5 for object detection β If youβre looking to build an object detection model in PyTorch, the
yolov5
GitHub repository might be the quickest way to get started.
- VISSL (Vision Self-Supervised Learning) library β Self-supervised learning is the art of getting data to learn patterns in itself. Rather than providing labels for different classes and learning a representation like that, self-supervised learning tries to replicate similar results without labels. VISSL provides an easy to use way to get started using self-supervised learning computer vision models with PyTorch.
πΒ Natural Language Processing (NLP)
Natural language processing involves finding patterns in text.
For example, you might want to extract important entities in support tickets or classify a document into different categories.
If your problem involves a large of amount of text, youβll want to look into the following resources.
- TorchText β PyTorchβs in-built domain library for text. Like TorchVision, it contains plenty of pre-built methods for loading data and a healthy collection of pretrained models you can adapt to your own problems.
- HuggingFace Transformers library β The HuggingFace Transformers library has more stars on GitHub than the PyTorch library itself. And thereβs a reason. Not that HuggingFace Transformers is better than PyTorch but because itβs the best at what it does: provide data loaders and pretrained state-of-the-art models for NLP (and a whole bunch more).
- Bonus: To learn more about how to HuggingFace Transformers library and all of the pieces around it, the HuggingFace team offer a free online course.
π€Β Speech and Audio
If your problem deals with audio files or speech data, such as trying to classify a sound or transcribe speech into text, youβll want to look into the following resources.
- TorchAudio β PyTorchβs domain library for everything audio. Find in-built methods for preparing data and pre-built model architectures for finding patterns in audio data.
- SpeechBrain β An open-source library built on top of PyTorch to handle speech problems such as recognition (turning speech into text), speech enhancement, speech processing, text-to-speech and more. You can try out many of their models on the HuggingFace Hub.
β Recommendation Engines
The internet is powered by recommendations. YouTube recommends videos, Netflix recommends movies and TV shows, Amazon recommends products, Medium recommends articles.
If youβre building an online store or online marketplace, chances are youβll want to start recommending things to your customers.
For that, youβll want to look into building a recommendation engine.
- TorchRec β PyTorchβs newest in-built domain library for powering recommendation engines with deep learning. TorchRec comes with recommendation datasets and models ready to try and use. Though if a custom recommendation egnine isnβt up to par with what youβre after (or too much work), many cloud vendors offer recommendation engine services.
β³Β Time Series
If your data has a time component and youβd like to leverage patterns from the past to predict the future, such as, predicting the price of Bitcoin next year (donβt try this, stock forecasting is BS) or a more reasonable problem of predicting electricity demand for a city next week, youβll want to look into time series libraries.
Both of these libraries donβt necessarily use PyTorch, however, since time series is such a common problem, Iβve included them here.
- Salesforce Merlion β Turn your time series data into intelligence by using Merlionβs data loaders, pre-built models, AutoML (automated machine learning) hyperparameter tuning and more for time series forecasting and time series anomaly detection all inspired by practical use cases.
- Facebook Kats β Facebookβs entire business depends on prediction: whenβs the best time to place an advertisement? So you can bet theyβre invested heavily in their time series prediction software. Kats (Kit to Analyze Time Series data) is their open-source library for time series forecasting, detection and data processing.
π©βπ»Β How to get a job in ML
Once youβve finished an ML course, itβs likely youβll want to use your ML skills.
And even better, get paid for them.
The following resources are good guides on what to do to get one.
- "How can a beginner data scientist like me gain experience?" by Daniel Bourke (that's me π) β I get the question of βhow do I get experience?β often because many different job requirements state βexperience neededβ. Well, it turns out one of the best ways to get experience (and a job) is to: start the job before you have it.
- You Donβt Really Need Another MOOC by Eugene Yan β MOOC stands for massive online open course (or something similar). MOOCs are beautiful. They enable people all over the world at their own pace. However, it can be tempting to just continually do MOOC courses over and over again thinking βif I just do one more, Iβll be readyβ. The truth is, a few is enough, the returns of a MOOC quickly start to trail off. Instead, go off the trail, start to build, start to create, start to learn skills that canβt be taught. Showcase those skills to get a job.
- Bonus: For the most thorough resource on the internet for machine learning interviews, check out Chip Huyenβs free Introduction to Machine Learning Interviews book.
See you next month!
I hope you enjoyed this special resources edition of Machine Learning Monthly.
If this is your first time, you can read the previous issues of the Machine Learning Monthly newsletter here.
As always, let me know if there's anything you think should be included in a future post.
In the meantime, keep learning, keep creating, keep dancing.
See you next month,
Daniel
www.mrdbourke.com | YouTube
By the way, I'm a full-time instructor with Zero To Mastery Academy teaching people Machine Learning in the most efficient way possible. You can see a few of our courses below or check out all Zero To Mastery courses.