[February 2022] Machine Learning Monthly Newsletter 💻🤖

26th issue! If you missed them, you can read the previous issues of the Machine Learning Monthly newsletter here.

Daniel here, I'm 50% of the instructors behind Zero To Mastery's Machine Learning and Data Science Bootcamp course and our new TensorFlow for Deep Learning course! I also write regularly about machine learning and on my own blog as well as make videos on the topic on YouTube.

Welcome to this edition of Machine Learning Monthly. A 500ish (+/-1000ish, usually +) word post detailing some of the most interesting things on machine learning I've found in the last month.

What you missed in February as a Machine Learning Engineer…

My work 👇

New ZTM PyTorch course (coming soon) — I’m working on a hands-on introduction to PyTorch course. So far 34 videos of the first section (notebook 00) have been recorded! And there’s plenty more to come over the next couple of months. In the meantime, the GitHub repo will have almost daily updates.
Nutrify web app: take a photo of food and learn about it — A little halt on the progress of Nutrify, well, a detour. I’ve been learning more and more about databases. Specifically, PostgreSQL and how to connect a database with a web application. I wanted Nutrify to display data for the nutrition quantities of 100 foods by the end of last month but that looks like it’ll be the end of this month. This is fun though. Learning to build machine learning-powered web apps through trial by fire.
Not ML-related: flooding in my hometown — To add to the crazy start to the year, my home city, Brisbane, experienced 1-2 years worth of rainfall in 3 days... Because of this, a lot of the city went underwater. Including the farm I’ve been working on one day per week for the last few months. The water seems to have cleared for now though, time to start the rebuild!

loop-under-water

Flooding at the farm. From green paradise to swampland in 3 days.

Machine Learning Items From the Internet 🕸

When time is money — DeepETA by Uber

Uber’s business model relies on providing accurate lead times for picking up passengers, freight and transporting food.

Poor arrival time estimations (ETA) and demand calculations result in less money.

So anything they can do to improve their ETA predictions results in a direct increase to profits for the company.

In the past, Uber Engineering created more and more sophisticated versions of XGBoost (a popular structured data algorithm) to handle their workloads.

But eventually, the well of XGBoost started to dry and performance gains required something new.

That’s when they turned to deep learning.

After exploring several algorithms such as MLPs (multilayer perceptrons), NODE (neural oblivious decision ensembles for deep learning on tabular data), Tabnet (Attentive interpretable tabular learning), transformers and more, they landed on the transformer being the architecture of choice.

DeepETA model structure

Illustration of the DeepETA model structure created by Uber Engineering. Note the combination of continuous, discrete and type features as inputs to the model. Source: Uber Engineering Blog.

More specifically, the linear transformer.

Why?

Because latency matters, a lot.

A slow prediction means a poor customer experience.

So the linear transformer was chosen as part of the few ways Uber sped up the model while maintaining performance:

Linear transformers — to remove the requirement of calculating an attention matrix (4-5x faster in their small example).
More embeddings, fewer layers — instead of relying on the network to perform large amounts of calculations for a single prediction, the network leverages embeddings (lookup tables) for fast calculation. A single sample ends up only touching ~0.25% of the networks overall parameters.
ReLU at output — to prevent ETA predictions from being negative (unless time travel is possible, ETA should always be positive).

Get better accuracy by having your data look at itself — TensorFlow self-supervised learning

Self-supervised learning is starting to touch every part of machine learning.

It works by getting a model to learn a representation of the data itself without any labels.

Basic self-supervised learning training setup

Basic self-supervised learning training setup. Start with unlabelled data to create a representation of the data itself. Then use the representation as the starting point for a supervised model. Source: The TensorFlow Blog.

One way to do this is via contrastive learning.

Contrastive learning teaches a model to identify the same image as being the same image from multiple points of view (e.g. a default image and augmented versions of itself).

Doing this enables a model to build a representation of what similar images look like and what they don’t look like.

TensorFlow’s Similarity module (tensorflow_similarity) now includes examples and code to get started using self-supervised learning on your own datasets.

The example notebook shows how you can use a self-supervised learning algorithm such as SimCLR, SimSiam and Barlow Twins to learn a representation CIFAR10 (a popular image classification dataset with 10 different classes) as a pretraining step to a model.

The model with the self-supervised pretraining step out performs a traditional supervised model by almost 2x.

Machine Learning models have less of a carbon footprint than previously thought

In a blog post titled “Good News About the Carbon Footprint of Machine Learning”, Google shares some important research on how much carbon machine learning models emit.

We all love training models.

But model training isn’t free.

It costs in hardware, time and electricity.

And often the electricity comes with an attached carbon emission from burning fossil fuels.

This isn’t ideal if you’d like to take care of the environment. It also isn’t ideal if you’re training larger and larger models.

But in a recent paper titled “The Carbon Footprint of Machine Learning Training will Plateau, Then Shrink” (included in the blog post linked above), Google outline the 4Ms: best practices to reduce energy and carbon footprints.

Model — Using efficient machine learning (ML) models like sparse models can improve ML performance whilst also reducing computation by 3-10x.
Machine — By selecting the right piece of hardware for training, more efficiency gains can found. Often this is the latest piece of hardware with all the upgraded performance techniques (less time training means less time emitting). And usually these newer pieces of hardware are available in the cloud. Savings on using optimized hardware can range from 2-5x.
Mechanization — Speaking of the cloud, many cloud-based data centres run off some amount of renewables, they also benefit from having large amounts of effort dedicated to power management. After all, when power usage is money, cloud-based data centers are best incentivised to use less.
Map optimization — If you are using a cloud-based data center, where in the world should you train your models? Choosing a place with the cleanest energy can further reduce emissions by 5-10x.

The rest of the blog post explains more about how previous estimations for carbon emissions of machine learning models are wrong.

Many of them forget to take into account the above points.

For one particular model, the researchers found previous estimations were the equivalent of estimating the carbon emissions to manufacture a car, multiplying those by 100x and then say that’s how much carbon comes out of driving a car.

A time when being wrong is a good thing.

A Fantastic Guide to the different levels of MLOps by Google Cloud

So you want to take your model and put it in an application or production setting where others can use it?

You’re going to have to start getting familiar with the term MLOps.

MLOps stands for machine learning operations.

It’s a process that involves the operations around building a machine learning model and then incorporating it into an application or service someone can use.

You can think of these operations as the steps around machine learning model building.

Data collection, data verification, data processing, model training, model evaluating, model deployment.

Building blocks of ML systems

The building blocks of ML systems. Note how small of a segment ML code is. All of the sections around it could be considered part of “MLOps”. Source: Google Cloud documentation.

Google Cloud’s documentation breaks MLOps down into three levels:

MLOps Level 0: Manual Process — Here you’re doing a lot of the steps manually, data collection, data verification, model training, model deployment. All of these are done with separate scripts and notebooks and usually triggered one by one by the machine learning engineer or data scientist working on the project. This is what I’m doing for Nutrify (my current project) right now.

mlops-continuous-delivery-and-automation-pipelines-in-machine-learning-2-manual-ml

An MLOps Level 0 workflow as defined by Google Cloud. Many of the steps here are manual. Each code script or preprocessing or training notebook triggered by the data scientist. And then the model deployed manually too. The goal of moving up the levels is to automate as many of the steps as possible. Source: Google Cloud documentation.

MLOps Level 1: ML pipeline automation — Moving on from level 0 you’ll usually want to automate a few of the steps you’ve been manually doing. Namely, this will involve creating a ML pipeline for model retraining. Doing so could mean that when you add new data somewhere, you can trigger an automated pipeline to complete steps such as data validation, data processing, model training, model evaluation and model deployment for you. This is known as CT or continuous training.
MLOps Level 2: CI/CD pipeline automation — CI stands for continuous integration and CD stands for continuous deployment. The level 2 stage incorporates both of these into your existing workflow. This enables you as a machine learning engineer or data scientist to run multiple experiments (by writing new code or via interface) and having those changes (if they’re good) integrated continuously (CI) to your production or target environment.

Papers & research 📰

Some exciting papers over the past couple of weeks, continuing the trend of mixing vision and language with sprinkles of self-supervised learning.

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation — BLIP combines vision and language data to both generate captions for images as well as allow to ask questions of images. Even better is that it leverages noisy web data to train on (less preprocessing of data). The code are datasets are available on GitHub, I even tried out the HuggingFace spaces demo with some of my own images:

Example of BLIP generated image captioning

Example of BLIP generated image captioning. The image is of a farm I’ve been working on lately. I love the caption, “this is a farm where two chickens are hanging out”. You could also use the same demo for visual question answering. Such as asking “what’s the weather like in the photo?”. Source: BLIP HuggingFace Spaces demo.

iBOT: Image BERT Pre-Training with Online Tokenizer — BERT is a transformer model that trains by masking out some tokens of text and then getting a model to guess what words should be there based on the words around the space. iBOT brings this kind of pre-training to images. It achieves state of the art image classification on ImageNet-1K for self-supervised learning models with 82.3% linear probing accuracy and 87.8% fine-tuning accuracy. Code and models are available on GitHub.
SEER 10B: Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision — What do you get when you train a 10B parameter model on 1 billion uncurated Instagram images? You one of the best performing and fairest computer vision models to ever exist. Meta’s latest version of SEER (SElf supERvised) is their biggest upgrade yet. Because it leverages global data, randomly selected from all over the world, it’s powerful enough to generalize to many different things, previously unavailable with other smaller models and datasets. See the blog post release and implementation details and code on GitHub.

SEER overview 4

SEER uses a combination of architecture design (CNN/vision transformers), training methods (SwAV, the self-supervised learning algorithm), scale (10B parameter model and 1B random images) and randomness (there’s no curation on the images) to achieve state of the art across a wide wide wide range of computer vision benchmarks. Source: SEER GitHub.

DETIC: Detecting Twenty-thousand Classes using Image-Level Supervision — DETIC combines image classification data with object detection to enable an object detection model to generalize to unseen classes using classification labels. For example, take ImageNet-21K, a dataset with 20,000+ classes of images with labels but the labels aren’t for object detection, they’re for classification. DETIC separates the process of detection and classification by using CLIP (contrastive image-language pretraining) embeddings as the weights for classification for classes without object-level labels (this is what makes DETIC scalable to almost any class definable by language). See the code and model demos on GitHub.

DETIC

DETIC trains on detection-labelled data as well as image-labelled data (classification labels). For classes without object labels or not in the classification data, DETIC uses CLIP embeddings as the weights, allowing it to generalize to classes not seen in the labels.

A couple of cool things 😎

TorchStudio is a new IDE for PyTorch and the surrounding tools such as torchvision, HuggingFace Hub, AWS, Azure and Google Cloud. It’s just come out of private beta so expect a few roadblocks but it looks solid so far. Could be a great companion to the upcoming ZTM PyTorch course!

TorchStudio

TorchStudio has a bunch of tools integrated with the PyTorch ecosystem, from loading datasets to visualizing models and tracking experiments. Source: TorchStudio homepage.

Podcast with Lex Fridman and Mark Zuckerberg — Lex Fridman’s podcast is one of my favourites. His guests are outstanding. And better yet, he’s got the AI knowledge to question some of the most technical but also the human quality (though whether he is actually human is debatable) to meet someone on their level. Lex’s recent interview with Mark Zuckerberg touched on many aspects of AI used at Meta from creating different styles of outfits in the Metaverse to detecting hateful content on Facebook. I really enjoyed this episode, I feel it was far more relatable to hear Mark speak conversationally than some of the polished stuff you see in the news. Listen on YouTube, Spotify or Apple Podcasts.

See you next month!

What a massive month for the ML world in February!

As always, let me know if there's anything you think should be included in a future post.

Liked something here? Tell a friend using those widgets on the left!

In the meantime, keep learning, keep creating, keep dancing.

See you next month, Daniel

www.mrdbourke.com | YouTube

By the way, I'm a full-time instructor with Zero To Mastery Academy teaching people Machine Learning in the most efficient way possible. You can see a couple of our courses below or see all Zero To Mastery courses by visiting the courses page.