13th issue! If you missed them, you can read the previous issues of the Machine Learning Monthly newsletter here.
Hey everyone, Daniel here, I'm 50% of the instructors behind the Complete Machine Learning and Data Science: Zero to Mastery course. I also write regularly about machine learning and on my own blog as well as make videos on the topic on YouTube.
Welcome to the 13th edition of Machine Learning Monthly. A 500ish (+/-1000ish, usually +) word post detailing some of the most interesting things on machine learning I've found in the last month.
Since there's a lot going on, the utmost care has been taken to keep things to the point.
I've been putting together a code-first introduction to deep learning with TensorFlow course.
If you've done a beginner machine learning course and want to dive into deep learning and TensorFlow, as well as potentially take the TensorFlow Developer Certification, this course is for you.
Students of the course will get hands-on practice writing deep learning models with TensorFlow, common deep learning troubleshooting techniques, practice searching for how to solve a problem and more.
The best places to get updates as the course gets ready to go live will be:
In May 2020, Facebook AI kicked off the Hateful Memes Challenge on DrivenData to detect whether a meme (image & text) would be offensive.
Not only was there a $100k prize pool for competitors, the competition involved combing visual and linguistic data (requiring visio-linguistic models), a trend of this month's post.
Niklas finished 2nd (congratulations)! Sharing his solution on GitHub. A phenomenal effort for someone getting into AI only 8-months ago (see Niklas's self-created AI Masters Degree, my favourite part is the emphasis on starting a project as soon as possible).
There's nothing more that I love seeing than someone sharing their work and interpretations of the world. Aurélien is a 3rd year computer science student from France writing about machine learning. For a sample, check out Aurélien's deep dive on residual neural networks.
Thank you for the submissions Niklas & Aurélien!
Some massive plays from OpenAI last month. Namely:
Let's go into each.
DALL·E (the name is a playful combination of the artist Salvador Dalí and Pixar's WALL·E) uses a transformer language model to take in an image and text as a single stream of 1280 tokens (256 for text and 1024 for the image). And because of this DALL·E is able to generate an image from scratch as well as fill holes in an image (from top left to bottom right) given a text prompt.
Two of my favourite examples of DALL·E's work (from the OpenAI blog post): Text prompt (passed to model): "a living room with two white armchairs and a paint of the colosseum. the painting is mounted above a modern fireplace."
Text prompt (passed to model): "the exact same teapot on the top with 'gpt' written on it on the bottom"
Wild.
See the OpenAI blog post for many more examples.
CLIP (Contrastive Language—Image Pre-trained) aims to tackle the main problems which face computer vision today:
How does it address these?
For costly datasets, CLIP trains on image and text pairs readily available on the internet. However, my main question: what's the training data? When they say CLIP is trained off of data (images + text) from the internet, this is quite broad. Did they scrape them? Or did they partner with Microsoft to get images off of Bing? I'm curious. Maybe the paper will reveal more.
For narrowness, because CLIP is trained on language-based descriptions of images as well as the images themselves, it can be adapted to other tasks by "telling" it what to do. Instead of gathering more data for your avocado image classifier, just tell CLIP "give me an avocado classifier". Again, wild.
For poor real-world performance, CLIP can be evaluated on several benchmarks (e.g. ImageNet, ObjectNet, ImageNet Adversarial) without being explicitly trained on those benchmarks (and it performs very well across all of them). This prevents it from over-performing on those any one specific benchmark and under-performing elsewhere.
One more thing I love about CLIP is the model card. Think of a nutrition label for a machine learning model.
Why is this important?
As models get more powerful (CLIP is crazy), people should be aware of their capabilities and potential exploits.
See more on CLIP:
This issue of the monthly really does have a red thread... visuals + text. This time it's a little more manual (sort of).
Amit's Papers with Video Chrome extension answers the question of whether or not an arXiv paper has a video version.
Why it matters: The first time I read a paper it might as well be in a language I don't speak. So I read it again and look for other interpretations and explanations, video being one of the main ones. I like to hear someone explain something and see how their explanation lines up with what I'm reading. Amit's extension adds video links to 3700 papers (and counting).
Amit also has an incredible developer blog. Check it out for posts on everything from semi-supervised learning to text data augmentation.
A new paper, Bottleneck Transformers for Visual Recognition, just dropped and it could be the driving force for transformers to takeover CNN's for visual data.
Readers of previous ML monthly issues have seen this coming (also, see CLIP and DALL·E above).
Making a simple change to the popular ResNet backbone (replacing the convolution layer with multi-headed attention), the authors created a network called BoTNet which performs up to 2.33x faster whilst matching current state of the art CNN-based computer vision models (e.g. EfficientNet).
Why this matters: Two things benefit the deep learning world: better results and faster performance. If you can't get one, you might as well get the other. And it seems the Transformer architecture is bringing all of its NLP gains to vision.
Bonus #1: If you're reading this and excited about the Transformer architecture but aren't sure where to start, check out Full-Stack Deep Learning's recent end-to-end introduction to Transformers with PyTorch thread below. It's all done in a Colab notebook so you can run and rewrite the code yourself.
🛠️Tooling Tuesday🛠️
Today, we share a @GoogleColab notebook implementing a Transformer with @PyTorch, trained using @PyTorchLightnin.
We show both encoder and decoder, train with teacher forcing, and implement greedy decoding for inference.https://t.co/dB1IL8WEGB
👇1/N
— Full Stack Deep Learning (@full_stack_dl) January 13, 2021
Made With ML used to be one of my favourite websites for machine learning. Then it pivoted and changed its offering and became even more of one of my favourite websites for machine learning 😂.
Made With ML used to showcase the different things people had made with ML, kind of like this newsletter but scalable, instead of me writing this in a local cafe, people all of the world could submit their work and have it upvoted by others.
Now Made With ML is focused on less but better:
Why this matters: Too many things lead to overwhelm. Made With ML's founder Goku Mohandas (follow him on Twitter for a treasure trove of ML content) realised this and decided to pivot the website to something more beneficial over the long-term. Big props for this. I'm a huuuuge fan of the website and am learning a lot from Goku's lessons.
Ever want the best tools and resources for productionizing your machine learning models in one place?
Well then, the awesome production machine learning repo is for you. From data labelling to model versioning, you'll find it there.
The page is so good there's too much for anyone to comprehend. Best to bookmark it and keep it for when you need it.
Josh Tobin is a beast in the machine learning field. Previously at OpenAI and now working on a stealth-startup (this sounds almost Batman-like), he recently gave a talk on his idea of the Evaluation Store or in other words, the missing piece of full-stack machine learning 👇
What is full-stack machine learning?
Take all of the parts of the puzzle, data collection, data verification, data preprocessing, data modelling, model deployment, user-interface design, model monitoring, etc. That's full-stack machine learning.
And the one thing they all have in common?
They all require some kind of evaluation, meaning, how do you know how your system is performing at each stage? Because if one stage is performing poorly, the others get exponentially effected.
Josh's idea of the Evaluation Store is the missing piece — a way to track and monitor how your system is doing at every stage.
Another idea I loved from the talk is the concept of the "data flywheel", creating a system to continually collect, verify and model data based on what's most needed (this could be tracked by the Evaluation Store).
Overview of the data flywheel effect, collect data, clean and label, train, test, deploy, monitor, collect more data, repeat. Source: A Missing Link in the ML Infrastructure Stack - Josh Tobin (Stealth Startup, UC Berkeley, OpenAI)
A great example of a data flywheel is Tesla's fleet. If an engineer finds cars are performing poorly making right turns next to big yellow buses (I'm making this up and simplifying), they might query their database for more examples of their cars turning right next to yellow buses and upgrade their models, in turn closing the loop.
Watch the full missing piece of full-stack ML talk above. Big thanks to Josh and Ternary Data for putting it together.
Bonus #2: The Full Stack Deep Learning 2021 (Josh is one of the instructors) session is starting March 1st, if you want to get your deep learning models out in the wild, you should check it out. I've signed up and can't wait.
I picked this one up from Josh's talk above. It turns out Apple published a paper in September 2019 describing Overton, their system for helping engineers build machine-learning-based applications without writing any code in frameworks like TensorFlow.
If they're not writing TensorFlow, what do the engineers using Overton do?
They focus on higher-level tasks such as:
The Overton paper (easily one of the most beautiful papers I've ever read) runs through a machine learning scenario with the goal of developing a model to answer the question "how tall is the president of the united states?".
In answering that question it paints the picture of a day in the life of an Overton engineer (focusing on improving and existing feature and a cold-start use case—using Overton for a new feature) as well as how Overton takes in a schema (task definition and model details) and supervised data (often weakly or programmatically supervised) and outputs a deployable model (to prevent bottlenecks from getting the model into production testing).
Figure 1: Schema and supervision data are input to Overton, which outputs a deployable model. Engineers monitor and improve the model via supervision data. Source: https://arxiv.org/pdf/1909.05372.pdf
If you're looking to setup a complete machine learning system I'd highly recommend checking out the full Overton paper.
The best part? Apple battle-tested their framework for over a year before publishing the paper, so the theory comes from practical grounding (the best kind of theory).
What a massive month for the ML world in January! So much happening in the vision + text & MLOps space!
As always, let me know if there's anything you think should be included in a future post. Liked something here? Tell a friend!
In the meantime, keep learning, keep creating, keep dancing.
See you next month,
Daniel www.mrdbourke.com | YouTube
PS. You can see video versions of these articles on my YouTube channel (usually a few days after the article goes live). Watch previous month's here.
By the way, I'm a full time instructor with Zero To Mastery Academy teaching people Machine Learning in the most efficient way possible. You can see a couple of our courses below or see all Zero To Mastery courses by visiting the courses page.