Machine Learning Monthly (October 2020)

10th issue! If you missed them, you can read the previous issues of the Machine Learning Monthly newsletter here.

Hey everyone, Daniel here, I'm 50% of the instructors behind the Complete Machine Learning and Data Science: Zero to Mastery course. I also write regularly about machine learning and on my own blog as well as make videos on the topic on YouTube.

Welcome to the 10th edition of machine learning monthly. A 500ish (+/-1000ish, usually +) word post detailing some of the most interesting things on machine learning I've found in the last month.

Since there's a lot going on, the utmost care has been taken to keep things to the point.

What you missed in October as a Machine Learning Engineer…

My work. Your benefit. 👇

Video version of this article!
[Article] 10 Commandments of Self-Taught Machine Learning Engineers — I remixed the original 10 commandments with a machine learning flavour. The essence: everyone should be in charge of their own education, this article provides guidelines for how you can do so for machine learning.
[Video] State of AI Report 2020 Review — The State of AI report for 2020 (see below) came out at the start of October. It contains a recap of many of the innovations in the field of Artificial Intelligence over the past year. I made a video going through each of the findings and adding my own commentary to it.

The best from the community 🙋‍♀️

Kevin's Mask detecting Telegram Bot

After being inspired by seeing Uber's mask detection app (an app which detects whether or not you're wearing a face mask), Kevin Degila decided he wanted to replicate it. Using fast.ai, Kevin trained a computer vision model on images of people wearing face masks and people not wearing face masks.

Once the model was trained, Kevin deployed it to Heroku and made it accessible through a Telegram Bot. By making it available as a Telegram Bot, this means Kevin's model is able to run on Android or iOS or wherever you have Telegram installed.

Not only did Kevin succeed in replicating Uber's mask detection app's functionality, he did a superb job of sharing his work through Twitter and a detailed blog post of how you can do it too.

Inspiring work Kevin!

See Kevin's Telegram Bot demo on Twitter

Read Kevin's Telegram Bot Machine Learning model deployment guide on Medium

The best from the internet 🕸

NLP Fire Sale 🔥

October must be the month for Natural Language Processing (NLP) resources. Because I stumbled upon a plethora of tutorials, roadmaps, notebooks and more.

Modern Practical Natural Processing by Jonathon Mugan — From turning text into numbers (vectors) to classifying those numbers into different categories to generating new text from old text, Jonathon's PyTorch-based course + video series will introduce you to many of the most important concepts in NLP.
Getting Started with NLP by Elvis Saravia — I've been following Elvis's work for a couple of years now. His blog post summaries of the latest research have been outstanding. Now he's created a list of his recommendations for those wanting to get started in the field.
The Super Duper NLP Repo by Quantum stat — Ever wanted a mega repo containing working notebook examples of different NLP techniques? Well, as you might've guessed, The Super Duper NLP Repo is just that. As of writing, it contains 226 notebook examples of NLP problems from Text Generation with an RNN to using BERT (a very good NLP model) for text classification.
Natural Language Processing News (issue #53) by Sebastian Ruder — Seb Ruder is one of the titans of the NLP field. If you haven't subscribed to his newsletter, you should. It's how I stay on top of how the field of NLP is tracking and it's where I got many of these NLP resources.
NLP Tutorial(s) by Tae-Hwan Jung — Tae-Hwan has put together a mammoth amount of NLP techniques from embeddings to attention, all coded in PyTorch and TensorFlow. I especially love Jung's GitHub bio: "One person's open source can lead new market with new technology."

Of course, going through all of these resources is probably too much. So my advice is to skim through each and see which sparks your curiosity the most then follow it through. You can always bookmark the rest and come back to them later if you need.

State of AI Report 2020 by Nathan Benaich and Ian Hogarth 🤖

What it is: For the last 3 years, Nathan Benaich and Ian Hogarth have collected the latest and great innovations in the field of artificial intelligence and put them together in a report. The State of AI Report 2020 breaks the field down into the following:

Research (latest breakthroughs)
Talent (supply and demand of AI-skills)
Industry (where AI is being used for commercial purposes)
Politics (regulations and geopolitics of AI)
Predictions (what the authors believe will happen in the next 12 months)

Why it matters: I first went through this report last year (the 2019 edition) and absolutely loved it. It can be hard to keep on top of a rapidly changing field but luckily Nathan and Ian have done some huge work by doing it for you.

A 2020 Guide to Data & Infrastructure by Andreessen Horowitz 🏗

What it is: Andreessen Horowitz (famous venture capital firm) published a detailed guide on the state of architectures for data infrastructure. The architecture covers data sources, data ingestion and transformation, storage, historical (for analytics), predictive (for the future) and outputs along with different tools that can be used for each as well as case studies from several large companies on their data infrastructure setups.

map

A few of my favourite quotes from the report:

“The race towards data is also reflected in the job market. Data analysts, data engineers and machine learning engineers topped LinkedIn’s list of fastest-growing roles in 2019.”
"Netflix generates more than 80% of content views through ML recommendation system."
"Airbnb increased booking conversion rate by ~4% by modelling likelihood of host acceptance."
“Doing machine learning at scale is among the most challenging data problems today.”

Why it matters: The last quote is why the report matters. It's one thing to build a machine learning model in a notebook, it's another thing to deliver the outputs of that model to millions of users.

[Course] Putting ML in Production by Made With ML 📽

What it is: A hands-on (code and videos) guide to putting machine learning in production by Goku Mohandas (founder of Made With ML). Follow Goku along as he builds Made With ML's first machine learning feature, tagifai, an automatic multilabel classification of tags for a project submitted to the site.

Why it matters: I've been through the first few videos and have taken away something from each one. Machine learning is much more than just building a model.

How do you your users interact with the model? Does it augment or automate a feature for them? How much does it cost to run? How long does it take? How do you maintain it overtime?

When you start putting your models into production, these are kind of questions you'll have to ask yourself. And so far, Goku's course has done a great job of making that clear. I'm looking forward to going through the new videos each week.

[Book] Machine Learning Engineering by Andriy Burkov 📖

What it is: Model building in machine learning has started to mature. Researchers and industry have shared their secrets around best practices for what architectures and algorithms work best for different scenarios. But if you're a beginner machine learner, getting your work into the hands of others (deploying your models) can still seem like black magic. Andriy's Book Machine Learning Engineering reveals many of the best practices for building machine-learning powered applications, from model testing to model retraining.

Why it matters: I've been reading this book 10 pages per day for the past month. And I feel like it contains all of my Google Searches from the past 2-years. If you're looking for a phenomenal getting-started reference on all of the practices around putting your machine learning models in production, this is it.

The best thing?

You can start reading Machine Learning Engineering online for free before you buy it!

NumPy: The Manifesto 🐍

What it is: If you've written any machine learning code, chances are, its used NumPy under the hood or has been designed around principles contained within the NumPy library. Well, the library we all love has recently published a paper in Nature describing its history, ecosystem and design philosophies.

Why it matters: Going back to Tae-Hwan's quote from before **"**One person's open source can lead new market with new technology", NumPy isn't created by one person but it is open source and it powers many other open source libraries we all use, such as, pandas, Scikit-Learn, matplotlib and SciPy. It's a testament to those who created the library to not only use it but be familiar with its history. In the discussion, the authors also call for future contributors to help "NumPy meet the needs of the next decade of data science".

[Resources] The Incredible PyTorch 🔦

What it is: A curated list of tutorials, projects, libraries, videos, papers, books and anything related to PyTorch. Seriously, if it's PyTorch related, it's probably here.

Why it matters: According to the State of AI Report 2020 (see above), PyTorch is fast becoming the researchers deep learning framework of choice, with a majority of research papers being published with PyTorch code samples to go along. So if you're wanting to upskill yourself in the PyTorch domain, The Incredible PyTorch has a plethora of examples and resources you can use to do so.

Optimus Prime vs. CNN's (transformers for images) 💻

What it is: Over the past few years the Transformer architecture has taken the NLP world by storm, achieving state of the art results across the board. Now it's coming for computer vision. In An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale researchers explore the use of the Transformer architecture for several computer vision tasks (ImageNet, CIFAR-100) and find when pre-trained on large amounts of data, a pure transformer model can match the best performing convolutional neural network (CNN) architecture performance whilst requiring substantially fewer computational resources.

Why it matters: Same results? Less compute? Why not? Recall from the June 2020 issue of Machine Learning Monthly we discussed that you'll probably be seeing more of Transformers in the future, well I have a feeling this is just the start of what's to come for transformer-powered computer vision.

Learn more about transformers:

[Tutorial] Training a Custom Mobile Object Detection Model (Weekend Project Idea 💡)

What it is: So you want to use a state of the art object detection model and have it work on a mobile device? But you're not sure where to start? Don't worry, the Roboflow team have you covered. This end-to-end tutorial will show you how to prepare a YOLOv4 custom dataset, train a YOLOv4 object detection model (currently state of the art for lightweight architectures), convert the model to TensorFlow Lite and then deploy the model to an Android device (though the model would be fit for iOS or Raspberry Pi devices).

Why it matters: I see tutorials like this as momentum builders. Of course, you can follow the guidelines, but the fun part will happen when you adjust it for your own needs. The tutorial goes through detecting different blood cells in microscopy images but since the framework is there, could be adjusted for many other object detection use-cases. A great weekend project!

[Podcast] Gradient Dissent by Weights & Biases 🎧

What it is: Every week or so, the Weights & Biases team (creators of amazing machine learning tools) interview some of the most interesting people in the field. From Kaggle competition winners to deep learning library creators to machine learning researchers. It's quickly becoming my favourite machine learning podcast. I've been listening to episodes whilst I'm driving or playing them in the background whilst I'm coding.

Why it matters: If you're new to the field (or already in it), it's incredible to be able to hear from people who are doing things you'd like to be doing or doing similar things to you. Whenever I listen to the show, I can't help but feel a sense of community and inspiration. I'm especially excited to listen to the latest episode with Ines Montani and Sofie Van Landeghem from Explosion.ai (creators of the NLP library spaCy).

See you next month!

As usual a massive month for the ML world in October.

As always, let me know if there's anything you think should be included in a future post. Liked something here? Send us a tweet.

In the meantime, keep learning, keep creating.

See you next month,

Daniel www.mrdbourke.com | YouTube

By the way, I'm a full time instructor with Zero To Mastery Academy teaching people Machine Learning in the most efficient way possible. You can see a couple of our courses below or see all Zero To Mastery courses by visiting the courses page.