32nd issue! If you missed them, you can read the previous issues of the Machine Learning Monthly newsletter here.
Hey everyone!
Daniel here, Iβm a Machine Learning Engineer who teaches the following beginner-friendly machine learning courses:
I also write regularly about machine learning and on my own blog as well as make videos on the topic on YouTube.
Enough about me!
You're here for this month's Machine Learning Monthly Newsletter. Typically a 500ish (+/-1,000ish, usually +) word post detailing some of the most interesting things on machine learning I've found in the last month.
The Zero to Mastery PyTorch course is live!
Easily one of the most requested courses is officially live. Inside, weβll learn and practice writing PyTorch code, the most popular deep learning framework in a hands-on and beginner-friendly way.
PyTorch Paper Replicating and PyTorch Model Deployment materials completed.
The final two milestone projects of the ZTM PyTorch course have been completed and videos are being edited as you read this, expect to see them on your ZTM Academy dashboard within the next couple of weeks.
In the meantime, you can read the materials on learnpytorch.io:
Iβve been seeing a lot about model optimization lately.
How do you prepare your model in the best way possible for the most efficient results?
As in, if you want to deploy your model to a system for it to perform inference, how do you make that inference happen as fast as possible?
In the case of a Tesla self-driving car, model optimization might mean a model making predictions fast but still keeping the energy usage low and accuracy high.
One of the best ways to optimize a model for deployment is via quantization.
Quantization in machine learning is the practice of reducing how much memory the parts of a neural network (weights, biases, activation's) require.
Why would you want to do this?
Well if a neural network requires 1GB of space (this number is arbitrary, some models require more, some require less) and your device only has 512MB of space, the neural network wonβt be able to run on the device.
So one of the steps you might take is to quantize the model in an effort to reduce how much space it takes up.
This practice of reducing space is often referred to as βreducing precisionβ.
Whatβs precision?
Computers donβt represent numbers exactly.
Instead, they use varying degrees of precision, using varying combinations of 0βs and 1βs to represent another number.
The higher the number of 0βs and 1βs used to represent a number, the higher the precision but the higher the storage.
For example, if using 32-bit precision (also called float32 or single-precision floating-point format), a computer represents a number using 32 bits:
number (float32) = 01010101010101010101010101010101
The above example is a made-up sequence of 32 0βs and 1βs to represent some number.
Many deep learning libraries (such as PyTorch) use float32 as the default datatype.
However, due to the flexibility of neural networks, you can often use lower precision datatypes to represent numbers without many tradeoffs in performance (of course this will vary from problem to problem).
The float16 (half-precision floating-point format) datatype uses 16 bits to represent a number:
number (float16) = 0101010101010101
The number gets represented by fewer values but is still precise enough to be used in a neural network.
Using lower precision often speeds up training (fewer numbers to manipulate) and reduces model storage size (fewer bits representing a network).
PyTorch offers a way to use float16 representation during training through torch.amp, where amp
stands for Automatic Mixed Precision, meaning PyTorch will automatically use mixed forms of precision (float32 and float16) where possible to improve training speed whilst attempting to maintain model performance.
Finally, one of the most aggressive forms of reducing precision is to convert neural network components to int8 representation (this is the datatype most often used when the term quantization is used), where a number is represented by 8 bits:
number (int8) = 01010101
Again, fewer bits to represent a single number.
So stepping through the examples above, weβve gone from float32 to float16 to int8.
This is one of the main pros of quantization: a reduction in 4x from the number of bits to represent a number.
Scale this process throughout all of the elements in a neural network (weights, activations, biases) and you often get a smaller model size and faster inference time (lower latency).
However, because the neural network is now performing inference using βless preciseβ numerical representations, you also often get a degradation in performance.
For example, take the two model scenarios:
Iβve made these numbers up but they illustrate what you can often expect with quantization.
The quantized model requires half the size of the non-quantized model and performs inference 5x faster but achieves 1.5% less accuracy.
So you take a hit on performance but you get nice gains on storage requirements and inference time.
These improvements in storage size and inference time may be critical if youβre trying to deploy your model to a device with limited compute power (such as a mobile device).
However, if youβre going for the best performance possible with unlimited compute power, youβd likely opt for the biggest model you can.
For more on quantization and different datatypes, Iβd recommend the following:
In light of the above, the following are a collection of resources for applying quantization in practice.
1060.16 FPS
(float32) to 3172.54 FPS
(int8), an increase of ~3x!
Hardware requirements for running large language models (LLMs). Note that prior to LLM.int8() these models were often only usable on much larger compute resources. Source: LLM.int8() paper.
Stable Diffusion is a machine learning model capable of generating images from text prompts.
If youβve seen DALLE by OpenAI, then consider Stable Diffusion the open-source version.
Images generated from the prompt βfun machine learning course being taught in the city of Atlantisβ. Source: Stable Diffusion Hugging Face Space Demo.
This is a super exciting release in the world of AI. And very telling of where the field is going, more open, more opportunities for all.
Stable Diffusion is also the model powering the new dreamstudio.ai, an interface for generating images from text prompts:
Generating an image with the prompt βfun machine learning course being taught in the city of Atlantisβ. Source: DreamStudio.
And perhaps the best of all, you can find all of the code for the model(s) on GitHub.
A classic essay from 2019 (yes, 2019 is now considered classic in the world of machine learning) by Eric Colson from Stitchfix.
Eric argues that data science is not an assembly line (like a pin factory where everyone has a very specific role) but rather an environment that requires much trial and error across a wide range of topics.
In an assembly line, you can optimize things because you know your outcomes.
However, in data science, you canβt necessarily optimize a process because you donβt know your outcomes.
The cure?
Plenty of trial and error.
For data scientists and machine learning engineers, experiment often (especially if failure is low cost) and create a demo as soon as possible.
And if the cost of failure is high, use tried practices.
Once youβve deployed a machine learning model, youβll likely want to know how itβs performing in production.
This practice is called machine learning monitoring.
However, monitoring doesnβt just stop with a model, in Monitoring ML systems in production. Which metrics should you track? by EvidentlyAI (a tool for monitoring ML systems), the authors extend ML model monitoring to ML system monitoring with four parts:
Table of different things to monitor in a machine learning system along with whoβs involved. Source: EvidentlyAI blog.
Eugene Yanβs latest blog post, Simplicity is An Advantage but Sadly Complexity Sells Better, discusses how too often software and technology projects drown in complexity because it looks and sounds cool.
Whereas simplicity offers several benefits: easier onboarding for new staff/users (if you keep things simple, people can learn how to use/build on them faster) and a higher probability of longevity (simplicity often means using battle-tested tools such as Instagram scaling to 10+ million users with PostgreSQL).
And when it comes to machine learning techniques, even with all the latest breakthroughs, a lot of traditional techniques out-perform newer more complex methods:
Older/simpler machine learning techniques can often outperform newer/more complex techniques across a range of problem types. Source: Eugene Yan blog.
Two cool papers caught my eye this month (out of the many curated from Paperswithcode.com):
Do As I Can, Not As I Say: Grounding Language in Robotic Affordances
Robotics meets language modelling. Google AI and Everyday Robots partner up to create a robot with a single arm and many cameras to act as a language modelβs hands and eyes.
Given the instruction such as βI spilled my drink, can you help?β the robot interprets the instruction using the language model and then proceeds to execute actions such as finding picking up a sponge with its cameras and arm.
See the blog post and project website for cool video demos.
A Walk in the Park: Learning to Walk in 20 Minutes With Model-Free Reinforcement Learning
Another robotics paper, this time using reinforcement learning to teach an off-the-shelf robotics dog to walk in ~20 minutes in a variety of environments such as outdoors and on dirt.
Very cool to see reinforcement learning coming to the real-world. A few years ago, I thought that reinforcement learning only really works in video games, but more and more research is proving this wrong.
See the paper website for more cool video demos.
What a massive month for the ML world in August!
As always, let me know if there's anything you think should be included in a future post.
In the meantime, keep learning, keep creating, keep dancing.
See you next month, Daniel
By the way, I'm a full-time instructor with Zero To Mastery Academy teaching people Machine Learning in the most efficient way possible. You can see a few of our courses below or check out all Zero To Mastery courses.