Machine Learning Monthly 💻🤖

Daniel Bourke

February 1st, 2021

13 min read

Want a career in tech?

Take our career path quiz to find the best fit for you and get a personalized step-by-step roadmap 👇

Take The 3-Minute Quiz Take The 3-Minute Quiz

13th issue! If you missed them, you can read the previous issues of the Machine Learning Monthly newsletter here.

Hey everyone, Daniel here, I'm 50% of the instructors behind the Complete Machine Learning and Data Science: Zero to Mastery course. I also write regularly about machine learning and on my own blog as well as make videos on the topic on YouTube.

Welcome to the 13th edition of Machine Learning Monthly. A 500ish (+/-1000ish, usually +) word post detailing some of the most interesting things on machine learning I've found in the last month.

Since there's a lot going on, the utmost care has been taken to keep things to the point.

What you missed in January as a Machine Learning Engineer…

My work 👇

Video version of this article!

A new deep learning with TensorFlow course (in progress)

I've been putting together a code-first introduction to deep learning with TensorFlow course.

If you've done a beginner machine learning course and want to dive into deep learning and TensorFlow, as well as potentially take the TensorFlow Developer Certification, this course is for you.

Students of the course will get hands-on practice writing deep learning models with TensorFlow, common deep learning troubleshooting techniques, practice searching for how to solve a problem and more.

The best places to get updates as the course gets ready to go live will be:

The course GitHub repo (all code + extra materials will be public).
This Machine Learning Monthly newsletter
The course will be launched first on the Zero to Mastery Academy

From our ML community 🙋‍♀️

Detecting hateful memes by Niklas Muennighoff.

In May 2020, Facebook AI kicked off the Hateful Memes Challenge on DrivenData to detect whether a meme (image & text) would be offensive.

Not only was there a $100k prize pool for competitors, the competition involved combing visual and linguistic data (requiring visio-linguistic models), a trend of this month's post.

Niklas finished 2nd (congratulations)! Sharing his solution on GitHub. A phenomenal effort for someone getting into AI only 8-months ago (see Niklas's self-created AI Masters Degree, my favourite part is the emphasis on starting a project as soon as possible).

Aurélien Peden's Deep thoughts blog.

There's nothing more that I love seeing than someone sharing their work and interpretations of the world. Aurélien is a 3rd year computer science student from France writing about machine learning. For a sample, check out Aurélien's deep dive on residual neural networks.

Thank you for the submissions Niklas & Aurélien!

From the interwebs 🕸

All about OpenAI

Some massive plays from OpenAI last month. Namely:

DALL·E: Creating Images from Text — a model which is able to create images from text (e.g. "a snail made of harp" creates an image of a snail with a harp for a shell... yes... seriously)
CLIP: Connecting Text and Images — a neural network which learns visual features from natural language supervision (e.g. learning about a photo of a dog via its caption "a photo of a dog")

Let's go into each.

DALL·E (the name is a playful combination of the artist Salvador Dalí and Pixar's WALL·E) uses a transformer language model to take in an image and text as a single stream of 1280 tokens (256 for text and 1024 for the image). And because of this DALL·E is able to generate an image from scratch as well as fill holes in an image (from top left to bottom right) given a text prompt.

Two of my favourite examples of DALL·E's work (from the OpenAI blog post): Text prompt (passed to model): "a living room with two white armchairs and a paint of the colosseum. the painting is mounted above a modern fireplace."

Text prompt (passed to model): "the exact same teapot on the top with 'gpt' written on it on the bottom"

Wild.

See the OpenAI blog post for many more examples.

CLIP (Contrastive Language—Image Pre-trained) aims to tackle the main problems which face computer vision today:

Costly datasets — large-scale image datasets require a bunch of work (ImageNet took 25,000 annotators), something which doesn't seem sustainable in the long-term.
Narrowness — an ImageNet model works well on the 1000 ImageNet classes, however, show it something out of these 1000 classes and it collapses.
Poor real-world performance — deep models often report superhuman performance on benchmarks, however, the real-world doesn't have benchmarks. Optimising for a single benchmark is akin to a student optimising to past exams for knowledge instead of learning principles applicable after graduation.

How does it address these?

For costly datasets, CLIP trains on image and text pairs readily available on the internet. However, my main question: what's the training data? When they say CLIP is trained off of data (images + text) from the internet, this is quite broad. Did they scrape them? Or did they partner with Microsoft to get images off of Bing? I'm curious. Maybe the paper will reveal more.

For narrowness, because CLIP is trained on language-based descriptions of images as well as the images themselves, it can be adapted to other tasks by "telling" it what to do. Instead of gathering more data for your avocado image classifier, just tell CLIP "give me an avocado classifier". Again, wild.

For poor real-world performance, CLIP can be evaluated on several benchmarks (e.g. ImageNet, ObjectNet, ImageNet Adversarial) without being explicitly trained on those benchmarks (and it performs very well across all of them). This prevents it from over-performing on those any one specific benchmark and under-performing elsewhere.

One more thing I love about CLIP is the model card. Think of a nutrition label for a machine learning model.

Why is this important?

As models get more powerful (CLIP is crazy), people should be aware of their capabilities and potential exploits.

See more on CLIP:

Does this paper include video?

This issue of the monthly really does have a red thread... visuals + text. This time it's a little more manual (sort of).

Amit's Papers with Video Chrome extension answers the question of whether or not an arXiv paper has a video version.

Why it matters: The first time I read a paper it might as well be in a language I don't speak. So I read it again and look for other interpretations and explanations, video being one of the main ones. I like to hear someone explain something and see how their explanation lines up with what I'm reading. Amit's extension adds video links to 3700 papers (and counting).

Amit also has an incredible developer blog. Check it out for posts on everything from semi-supervised learning to text data augmentation.

No more bottlenecks, Transformers takeover vision

A new paper, Bottleneck Transformers for Visual Recognition, just dropped and it could be the driving force for transformers to takeover CNN's for visual data.

Readers of previous ML monthly issues have seen this coming (also, see CLIP and DALL·E above).

Making a simple change to the popular ResNet backbone (replacing the convolution layer with multi-headed attention), the authors created a network called BoTNet which performs up to 2.33x faster whilst matching current state of the art CNN-based computer vision models (e.g. EfficientNet).

Why this matters: Two things benefit the deep learning world: better results and faster performance. If you can't get one, you might as well get the other. And it seems the Transformer architecture is bringing all of its NLP gains to vision.

Bonus #1: If you're reading this and excited about the Transformer architecture but aren't sure where to start, check out Full-Stack Deep Learning's recent end-to-end introduction to Transformers with PyTorch thread below. It's all done in a Colab notebook so you can run and rewrite the code yourself.

🛠️Tooling Tuesday🛠️
Today, we share a @GoogleColab notebook implementing a Transformer with @PyTorch, trained using @PyTorchLightnin.

We show both encoder and decoder, train with teacher forcing, and implement greedy decoding for inference.https://t.co/dB1IL8WEGB

👇1/N
— Full Stack Deep Learning (@full_stack_dl) January 13, 2021

Made With ML's pivot

Made With ML used to be one of my favourite websites for machine learning. Then it pivoted and changed its offering and became even more of one of my favourite websites for machine learning 😂.

Made With ML used to showcase the different things people had made with ML, kind of like this newsletter but scalable, instead of me writing this in a local cafe, people all of the world could submit their work and have it upvoted by others.

Now Made With ML is focused on less but better:

Teaching the fundamentals of machine learning (see ML Foundations on GitHub)
Teaching how to apply what your ML knowledge to real-world problems (see Applied ML on GitHub)

Why this matters: Too many things lead to overwhelm. Made With ML's founder Goku Mohandas (follow him on Twitter for a treasure trove of ML content) realised this and decided to pivot the website to something more beneficial over the long-term. Big props for this. I'm a huuuuge fan of the website and am learning a lot from Goku's lessons.

Awesome production machine learning

Ever want the best tools and resources for productionizing your machine learning models in one place?

Well then, the awesome production machine learning repo is for you. From data labelling to model versioning, you'll find it there.

The page is so good there's too much for anyone to comprehend. Best to bookmark it and keep it for when you need it.

The missing piece of full-stack ML

Josh Tobin is a beast in the machine learning field. Previously at OpenAI and now working on a stealth-startup (this sounds almost Batman-like), he recently gave a talk on his idea of the Evaluation Store or in other words, the missing piece of full-stack machine learning 👇

What is full-stack machine learning?

Take all of the parts of the puzzle, data collection, data verification, data preprocessing, data modelling, model deployment, user-interface design, model monitoring, etc. That's full-stack machine learning.

And the one thing they all have in common?

They all require some kind of evaluation, meaning, how do you know how your system is performing at each stage? Because if one stage is performing poorly, the others get exponentially effected.

Josh's idea of the Evaluation Store is the missing piece — a way to track and monitor how your system is doing at every stage.

Another idea I loved from the talk is the concept of the "data flywheel", creating a system to continually collect, verify and model data based on what's most needed (this could be tracked by the Evaluation Store).

Overview of the data flywheel effect, collect data, clean and label, train, test, deploy, monitor, collect more data, repeat. Source: A Missing Link in the ML Infrastructure Stack - Josh Tobin (Stealth Startup, UC Berkeley, OpenAI)

A great example of a data flywheel is Tesla's fleet. If an engineer finds cars are performing poorly making right turns next to big yellow buses (I'm making this up and simplifying), they might query their database for more examples of their cars turning right next to yellow buses and upgrade their models, in turn closing the loop.

Watch the full missing piece of full-stack ML talk above. Big thanks to Josh and Ternary Data for putting it together.

Bonus #2: The Full Stack Deep Learning 2021 (Josh is one of the instructors) session is starting March 1st, if you want to get your deep learning models out in the wild, you should check it out. I've signed up and can't wait.

Apple's system for delivering ML-powered features to over a billion devices

I picked this one up from Josh's talk above. It turns out Apple published a paper in September 2019 describing Overton, their system for helping engineers build machine-learning-based applications without writing any code in frameworks like TensorFlow.

If they're not writing TensorFlow, what do the engineers using Overton do?

They focus on higher-level tasks such as:

Fine-grained quality monitoring — which subsets of data are performing the most poorly? How can these be improved?
Support for multi-component pipelines — which part of your machine learning pipeline is causing the most issues? How do you tackle those first?
Updating supervision — supervision (labelling data) is typically performed by annotators but this doesn't scale very well, how can you increase programmatic supervision which is both scalable and privacy preserving?

The Overton paper (easily one of the most beautiful papers I've ever read) runs through a machine learning scenario with the goal of developing a model to answer the question "how tall is the president of the united states?".

In answering that question it paints the picture of a day in the life of an Overton engineer (focusing on improving and existing feature and a cold-start use case—using Overton for a new feature) as well as how Overton takes in a schema (task definition and model details) and supervised data (often weakly or programmatically supervised) and outputs a deployable model (to prevent bottlenecks from getting the model into production testing).

Figure 1: Schema and supervision data are input to Overton, which outputs a deployable model. Engineers monitor and improve the model via supervision data. Source: https://arxiv.org/pdf/1909.05372.pdf

If you're looking to setup a complete machine learning system I'd highly recommend checking out the full Overton paper.

The best part? Apple battle-tested their framework for over a year before publishing the paper, so the theory comes from practical grounding (the best kind of theory).

See you next month!

What a massive month for the ML world in January! So much happening in the vision + text & MLOps space!

As always, let me know if there's anything you think should be included in a future post. Liked something here? Tell a friend!

In the meantime, keep learning, keep creating, keep dancing.

See you next month,

Daniel www.mrdbourke.com | YouTube

PS. You can see video versions of these articles on my YouTube channel (usually a few days after the article goes live). Watch previous month's here.

By the way, I'm a full time instructor with Zero To Mastery Academy teaching people Machine Learning in the most efficient way possible. You can see a couple of our courses below or see all Zero To Mastery courses by visiting the courses page.

You might like these courses

Intermediate

Complete A.I. Machine Learning and Data Science: Zero to Mastery

45 Hours •390 Lessons

Learn Machine Learning (Artificial Intelligence), Python, Data Science, Data Analysis, Tensorflow, Pandas & more. All using the latest in AI!

Andrei Neagoie&Daniel Bourke

Start Learning Course Details

Beginner

Complete Python Developer in 2026: Zero to Mastery

32 Hours •346 Lessons

Learn Python from scratch, get hired, and have fun along the way with the most up-to-date Python course on the web. Python is the entryway into the world of A.I., Cybersecurity, and many other high demand fields!

Andrei Neagoie

Start Learning Course Details

More from Zero To Mastery

The Developer’s Edge: How To Become A Senior Developer in 2026 preview

14 min read

Do you want to be a Senior Developer and excel in your field? You're in the right place. By the end of reading this, you will have a set path with a list of the best resources for you to level up and become a Senior Developer.

Andrei Neagoie

8 min read

14th issue of Python Monthly! Read by 1,000s of Python developers every month. This monthly Python newsletter is focused on keeping you up to date with the industry and keeping your skills sharp, without wasting your valuable time.

Andrei Neagoie

Python Monthly