[March 2022] Machine Learning Monthly Newsletter 💻🤖

27th issue! If you missed them, you can read the previous issues of the Machine Learning Monthly newsletter here.

Daniel here, I'm 50% of the instructors behind Zero To Mastery's Machine Learning and Data Science Bootcamp course and our new TensorFlow for Deep Learning course! I also write regularly about machine learning and on my own blog as well as make videos on the topic on YouTube.

Welcome to this edition of Machine Learning Monthly. A 500ish (+/-1000ish, usually +) word post detailing some of the most interesting things on machine learning I've found in the last month.

What you missed in March as a Machine Learning Engineer…

My work 👇

New ZTM PyTorch course (coming soon) — I’m working on a hands-on introduction to PyTorch course. So far 90 videos of the first three sections (notebook 00, 01, 02) have been recorded! And I’m expecting that number to double by the end of next month. In the meantime, the GitHub repo will have almost daily updates.
The Unofficial PyTorch Training Loop Song (video) — Since I’m working on a new PyTorch course, I wrote a song to help you remember the steps in a PyTorch optimization loop (a training and testing loop). For details on what the code does, stay tuned for the full course.

Nutrify web app: take a photo of food and learn about — This month’s update to Nutrify is that I got the food information for ~100 foods to display. I've also introduced a new food/not food model. This means Nutrify will now display macronutrients for the 100 or so foods it can identify. It’ll also detect whether or not there is food in the image. The food/not food model is powered by the same model in foodnotfood.app. Try it out. See if you can confuse it!

From the Internet

1. MLOps is a mess but that’s to be expected by Mihail Eric

MLOps (machine learning operations) is the process of going from data to predictive models to intelligence.

The following image showcases all of the tools in the MLOps space:

ml and data landscape-d79aaf0a06b8b0c3767711103a8f19e4-54793

Ummm...

What?

I’ve never heard of 80% of these and I spend all day writing ML code and dancing around the ML space.

Clearly there’s a lot going on.

Mihail Eric’s piece argues that this is to be expected. Since MLOps is still a new field, of course, things are going to be all over the place.

And that’s kind of what you’d want to begin with.

Different people trying different things and then seeing what turns out the best.

My advice for getting started in the world of MLOps?

Keep it simple. Build something end-to-end (deploy the models you build in notebooks) and see what the whole process is like first-hand.

That’s what I’m doing with Nutrify.

Although there’s plenty of shiny tools I’d like to try, I’m finding I can do most of what I need to do with well-known tools like TensorFlow and plain vanilla JavaScript.

2. Do you really need a feature store? by Lak Lakshmanan

You may have seen the term feature store floating around the ML space lately.

A feature store is a computed value used as input into a machine learning model that the person using the machine learning model might not have access to at the time of using it.

For example, let’s say you’re Uber and at the start of each day you compute the demand forecast as well as how many drivers you had on the road yesterday.

This sounds trivial but rather than compute it every single time you want to make a prediction, you might store this value and query it instead (querying is often faster than computing).

So when someone requests an Uber, instead of recomputing the already calculated value, it’s stored as a feature store and incorporated into a model that predicts ETA.

This is only a high-level example, there are other ways of adding features to a model:

In the preprocessing layers (within the model).
In a preprocessing function (outside the model but gets computed every time the model is used).
In a feature store (outside the model but gets queried every time the model is used).

There are pros and cons to each.

And that’s what Lak discusses in his post. He finishes it off with a nice decision tree for deciding if you need one or not (generally not, it often adds quite a lot of complexity for what it’s worth, however, if you have the resources, it can improve latency/performance).

feature-store-decision-tree

“Do you need to use a feature store?” decision tree. Source: Lak Lakshmanan.

3. A swirl of updates to PyTorch

With the upcoming ZTM PyTorch course, I’ve been paying close attention to everything and anything PyTorch.

With that being said, there’s been a fair few updates to PyTorch across the board (I’m making sure to include the most useful of these in the new course).

PyTorch 1.11, TorchData and functorch — TorchData contains modular data loading steps for constructing flexible data pipelines (and all ML projects start with the data!) and inspired by JAX, functorch enables several function transforms that are currently hard to do with pure PyTorch.
TorchRec and updates to TorchAudio, TorchText and TorchVision — TorchRec is a new recommender-system library, offering recommender-system tools as a dedicated PyTorch domain library. And there have been several upgrades to other domain libraries for audio, text and vision such as new pretrained models and datasets.

See the PyTorch blog for more.

4. Meta (Facebook) AI’s multi-modal advances

Combining vision and language data sources has been the theme of ML Monthly for the last few months.

But by the looks of Meta AI’s latest round of research, they’ve started to combine almost everything they can.

From 3D images to videos to images to depth maps, Omnivore can handle them all. Omnivore combines several vision modalities into one model.

My favourite part is that they used all off the shelf datasets to build the model (all data that’s publicly available).

omnivore-model

Omnivore can handle multiple different vision modalities and still perform better than models specifically trained for a certain modality. Source: Omnivore: A Single Model for Many Visual Modalities

FLAVA: A Foundational Language and Vision Alignment Model can handle 35 different tasks... getting closer and closer to one model to handle them all. FLAVA also uses a large number of public datasets (referred to as PMD: Public Multimodal Datasets in the paper). The FLAVA architecture combines a text encoder, image encoder and multi-modal encoder (image and text) to learn as much as possible from each data source.

CM3: A Casual Masked Multimodal Model of the Internet uses nearly a Terabyte of pure HTML code to create the first hyper-text language and image model. Because of the scale, the model is able to generate some gnarly images given a text prompt.

Screen Shot 2022-04-01 at 4.00.58 pm

Images generated based on text-prompts given to CM3. Source: CM3 paper.

But that’s not all, CM3 is capable of filling in masked portions of images, masked portions of text and even doing the reverse of the image above, generating captions when given an image.

5. Google’s Health AI Developments

We’ve talked about Google’s Health AI efforts in previous issues of ML Monthly but they’ve recently released a whole bunch of research (and shipped products).

Can you detect signs of disease from external images of the eye? Previous research has been done to detect the presence of diseases such as diabetes using photos of the back of the eye (fundus photos) but Google’s latest research uses computer vision and external eye photos (which are much easier to capture). Initial results outperformed a random baseline.

google-health-advancements

Measuring heart rate and respiratory rate via smartphone cameras. New research, as well as new features in the Google Fit app, allow people to measure their heart rates as well as their respiratory rates using a smartphone camera.

google-fit-app-usage-video-gif-cropped

I downloaded the Google Fit app and tried out the respiratory and heart rate trackers. I had mixed results depending on what kind of lighting I was in. A cool project would be to replicate this.

AI system for fetal ultrasound in low-resource settings. Maternal healthcare has improved but maternal and perinatal (the period when a woman becomes pregnant and up to a year after giving birth) deaths remain high in low-to-middle income countries. The ultrasound is a tool that’s very helpful for understanding fetus health during pregnancy. However, there is a shortage of skilled ultrasound practitioners. This paper discusses how an AI system that’s able to run on a mobile device to aid in performing ultrasounds. It was found that the AI system wasn’t far off standard clinical practice procedures even when performed by a novice.

6. Making deep learning models go brrrrr (fast) from first principles by Horace He

Everyone wants to make their models train faster.

And one way to do so is to use a GPU.

But let’s say you’ve got a GPU and you’ve seen a good speed up, how do you push things further?

Or how do you figure out what’s preventing your model from training faster?

Horace He works on the PyTorch team and shares his learnings on how to make your deep learning models go brrrrr using three first principles:

Compute — Time spent on your GPU performing actual floating point operations (FLOPS), like matrix multiplications.
Memory and bandwidth — Time spent transferring tensors within a GPU.
Overhead — Everything else.

He uses the analogy of a factory to explain things, where the memory is the supplies warehouse and the factory running is the compute and the back and forth between the two is the overhead.

factory

Computing as a factory. Memory stores all the supplies, overhead sends them back and forth and the factory (GPU) does all the computing. Source: Making Deep Learning Go Brrrr From First Principles by Horace He.

One of my favourite takeaways from the article was the power of operator fusion.

You want to minimize the time spent transferring data and operator fusion is one of the best ways to do so.

So instead of calling operations one by one, you can chain them together.

# No operator fusion (plenty of overhead due to reassigning values
x1 = torch.cos(x)
x2 = torch.cos(x1)

# Same operation using operator fusion
x2 = torch.cos(torch.cos(x))

7. There’s no such thing as “not a math person” by Rachel Thomas

Rachel Thomas is the co-founder of fast.ai, one of my favourite AI organizations.

In this piece, she argues that a lot of math education focuses on an overemphasis on techniques rather than meaning.

And this scares a lot of people off because as soon as you miss a single technique, you feel like you’re “not a math person”.

When really, learning math is like learning any skill, with time and effort you improve.

I like the concept of teaching/learning the whole game rather than just focusing on a single technique.

When you learn to drive a car, you don’t necessarily need to know how an internal combustion engine works.

You learn to drive from place to place.

And that momentum carries you forward if you’d like to learn more.

8. ML model time machine: 33 years ago and 33 years into the future by Andrei Karpathy

Head of Tesla AI, Andrei Karpathy travels back in time to some of the first ever working neural networks, a 1989 paper by Yann LeCun et al, Backpropagation applied to Handwritten Zip Code Recognition.

Perhaps surprisingly, Andrei states:

this paper reads remarkably modern today, 33 years later - it lays out a dataset, describes the neural net architecture, loss function, optimization, and reports the experimental classification error rates over training and test sets.

It sounds a lot like some of the papers I read this week.

Karpathy replicated (as best he could) the original training setup and was able to pull off training the whole network in ~90 seconds on an M1 MacBook Air CPU (3000X speed up to the original paper).

And applying some modern techniques such as dropout, the Adam optimizer and data augmentation, Andrei was able to reduce the original error on the paper by 60%.

Towards the end, Andrei offers his predictions for the future of neural networks.

The main one being perhaps the concept of a single network for a single task will be old news in the future (much like where research is heading now by combining modalities).

And it’s crazy to think that in another 33 years you might be able to train today’s state of the art models on commodity hardware in a few minutes.

You can see the code on GitHub.

9. mlxtend library

I’ve just discovered the mlxtend library by Sebastian Raschka, author of the popular Machine Learning with PyTorch and Scikit-Learn book.

I don’t know what took me so long to find it.

It’s full of helpful features that you often use in machine learning and data science that aren’t exactly ready to go in some other libraries.

One of my favourites being plotting a confusion matrix with plot_confusion_matrix().

10. Model “soups” get a new state of the art on ImageNet and other tasks

There’s a new SOTA (state of the art) for ImageNet with Model soups achieving 90.94% top-1 accuracy.

We’re getting closer and closer to 91%!

Model soups combine the weights of other models to form a “soup”.

The usual process is to train a bunch of models (via hyperparameter tuning) and then discard all of them except the best.

But model soups keep all of the extra models and then combine their weights via averaging or via reduction (if the extra model doesn’t improve the overall model, it’s discarded).

The new model soup saves on inference compute compared to ensembles because in the end it still ends up being only one model (rather than multiple).

11. NVIDIA unveils new H100 GPU with ~10x performance boosts over the previous generation at GTC 2022

It’s hard to overstate the progress in machine learning over the past 10 years...

I mean from their latest NVIDIA GTC (GPU technology conference) 2022 keynote, NVIDIA states they’ve increased accelerated computing by 1,000,000x over the last 10 years.

Check out machine learning, it’s leaving the chart.

1-million-x

Via a combination of specialized hardware (GPUs) and software (CUDA), NVIDIA has accelerated machine learning computing off the charts over the past 10 years. Source: NVIDIA GTC 2022 keynote.

And their latest hardware continues the trend.

The H100 Tensor Core GPU (H is for Hopper, as in, Grace Hopper) offers up to 9x faster training and 30x faster inference over the previous generation (A100).

What???

I mean we knew it would be faster...

But those speedups are crazy!

It’s a server-side GPU so potentially that means new NVIDIA consumer GPUs are on the way too.

There’s a bunch more from NVIDIA GTC 2022 as well, I’ve been watching some of the presentations and workshops, particularly the ones on PyTorch.

It requires a signup, but they’re free to watch on the NVIDIA GTC website.

See you next month!

What a massive month for the ML world in March!

As always, let me know if there's anything you think should be included in a future post.

In the meantime, keep learning, keep creating, keep dancing.

See you next month, Daniel

www.mrdbourke.com | YouTube

By the way, I'm a full-time instructor with Zero To Mastery Academy teaching people Machine Learning in the most efficient way possible. You can see a couple of our courses below or check out all Zero To Mastery courses.