26th issue! If you missed them, you can read the previous issues of the Machine Learning Monthly newsletter here.
Daniel here, I'm 50% of the instructors behind Zero To Mastery's Machine Learning and Data Science Bootcamp course and our new TensorFlow for Deep Learning course! I also write regularly about machine learning and on my own blog as well as make videos on the topic on YouTube.
Welcome to this edition of Machine Learning Monthly. A 500ish (+/-1000ish, usually +) word post detailing some of the most interesting things on machine learning I've found in the last month.
Flooding at the farm. From green paradise to swampland in 3 days.
Uberβs business model relies on providing accurate lead times for picking up passengers, freight and transporting food.
Poor arrival time estimations (ETA) and demand calculations result in less money.
So anything they can do to improve their ETA predictions results in a direct increase to profits for the company.
In the past, Uber Engineering created more and more sophisticated versions of XGBoost (a popular structured data algorithm) to handle their workloads.
But eventually, the well of XGBoost started to dry and performance gains required something new.
Thatβs when they turned to deep learning.
After exploring several algorithms such as MLPs (multilayer perceptrons), NODE (neural oblivious decision ensembles for deep learning on tabular data), Tabnet (Attentive interpretable tabular learning), transformers and more, they landed on the transformer being the architecture of choice.
Illustration of the DeepETA model structure created by Uber Engineering. Note the combination of continuous, discrete and type features as inputs to the model. Source: Uber Engineering Blog.
More specifically, the linear transformer.
Why?
Because latency matters, a lot.
A slow prediction means a poor customer experience.
So the linear transformer was chosen as part of the few ways Uber sped up the model while maintaining performance:
Self-supervised learning is starting to touch every part of machine learning.
It works by getting a model to learn a representation of the data itself without any labels.
Basic self-supervised learning training setup. Start with unlabelled data to create a representation of the data itself. Then use the representation as the starting point for a supervised model. Source: The TensorFlow Blog.
One way to do this is via contrastive learning.
Contrastive learning teaches a model to identify the same image as being the same image from multiple points of view (e.g. a default image and augmented versions of itself).
Doing this enables a model to build a representation of what similar images look like and what they donβt look like.
TensorFlowβs Similarity module (tensorflow_similarity
) now includes examples and code to get started using self-supervised learning on your own datasets.
The example notebook shows how you can use a self-supervised learning algorithm such as SimCLR, SimSiam and Barlow Twins to learn a representation CIFAR10 (a popular image classification dataset with 10 different classes) as a pretraining step to a model.
The model with the self-supervised pretraining step out performs a traditional supervised model by almost 2x.
In a blog post titled βGood News About the Carbon Footprint of Machine Learningβ, Google shares some important research on how much carbon machine learning models emit.
We all love training models.
But model training isnβt free.
It costs in hardware, time and electricity.
And often the electricity comes with an attached carbon emission from burning fossil fuels.
This isnβt ideal if youβd like to take care of the environment. It also isnβt ideal if youβre training larger and larger models.
But in a recent paper titled βThe Carbon Footprint of Machine Learning Training will Plateau, Then Shrinkβ (included in the blog post linked above), Google outline the 4Ms: best practices to reduce energy and carbon footprints.
The rest of the blog post explains more about how previous estimations for carbon emissions of machine learning models are wrong.
Many of them forget to take into account the above points.
For one particular model, the researchers found previous estimations were the equivalent of estimating the carbon emissions to manufacture a car, multiplying those by 100x and then say thatβs how much carbon comes out of driving a car.
A time when being wrong is a good thing.
So you want to take your model and put it in an application or production setting where others can use it?
Youβre going to have to start getting familiar with the term MLOps.
MLOps stands for machine learning operations.
Itβs a process that involves the operations around building a machine learning model and then incorporating it into an application or service someone can use.
You can think of these operations as the steps around machine learning model building.
Data collection, data verification, data processing, model training, model evaluating, model deployment.
The building blocks of ML systems. Note how small of a segment ML code is. All of the sections around it could be considered part of βMLOpsβ. Source: Google Cloud documentation.
Google Cloudβs documentation breaks MLOps down into three levels:
An MLOps Level 0 workflow as defined by Google Cloud. Many of the steps here are manual. Each code script or preprocessing or training notebook triggered by the data scientist. And then the model deployed manually too. The goal of moving up the levels is to automate as many of the steps as possible. Source: Google Cloud documentation.
Some exciting papers over the past couple of weeks, continuing the trend of mixing vision and language with sprinkles of self-supervised learning.
Example of BLIP generated image captioning. The image is of a farm Iβve been working on lately. I love the caption, βthis is a farm where two chickens are hanging outβ. You could also use the same demo for visual question answering. Such as asking βwhatβs the weather like in the photo?β. Source: BLIP HuggingFace Spaces demo.
SEER uses a combination of architecture design (CNN/vision transformers), training methods (SwAV, the self-supervised learning algorithm), scale (10B parameter model and 1B random images) and randomness (thereβs no curation on the images) to achieve state of the art across a wide wide wide range of computer vision benchmarks. Source: SEER GitHub.
DETIC trains on detection-labelled data as well as image-labelled data (classification labels). For classes without object labels or not in the classification data, DETIC uses CLIP embeddings as the weights, allowing it to generalize to classes not seen in the labels.
torchvision
, HuggingFace Hub, AWS, Azure and Google Cloud. Itβs just come out of private beta so expect a few roadblocks but it looks solid so far. Could be a great companion to the upcoming ZTM PyTorch course!TorchStudio has a bunch of tools integrated with the PyTorch ecosystem, from loading datasets to visualizing models and tracking experiments. Source: TorchStudio homepage.
What a massive month for the ML world in February!
As always, let me know if there's anything you think should be included in a future post.
Liked something here? Tell a friend using those widgets on the left!
In the meantime, keep learning, keep creating, keep dancing.
See you next month, Daniel
By the way, I'm a full-time instructor with Zero To Mastery Academy teaching people Machine Learning in the most efficient way possible. You can see a couple of our courses below or see all Zero To Mastery courses by visiting the courses page.