[April 2022] Machine Learning Monthly Newsletter 💻🤖

28th issue! If you missed them, you can read the previous issues of the Machine Learning Monthly newsletter here.

Daniel here, I'm 50% of the instructors behind Zero To Mastery's Machine Learning and Data Science Bootcamp course and our new TensorFlow for Deep Learning course! I also write regularly about machine learning and on my own blog as well as make videos on the topic on YouTube.

Welcome to this edition of Machine Learning Monthly. A 500ish (+/-1000ish, usually +) word post detailing some of the most interesting things on machine learning I've found in the last month.

What you missed in April as a Machine Learning Engineer…

My work 👇

New ZTM PyTorch course (coming soon) — I’m working on a hands-on introduction to PyTorch course. So far 162 videos of the first three sections (notebook 00, 01, 02, 03, 04) have been recorded! A couple more sections and it’ll be launch time! In the meantime, the GitHub repo will have almost daily updates.
Nutrify web app: take a photo of food and learn about — I’m working on a machine-learning powered app to turn food into QR codes. Right now, it can identify 100 different foods using computer vision and give macronutrient information each but I expect that number to increase drastically by the end of the year. Next month’s update will bring a button to track whether the food was right or wrong... a simple step toward building a data flywheel.

Machine Learning Updates From the Internet

1. All Roads Lead to Rome: The Machine Learning Job Market in 2022 by Eric Jang

Eric Jang recently went on a job hunt for new machine learning roles.

He based his decision on a number of parameters.

The main one being: “which company has the largest technological lead towards artificial general intelligence (AGI)”.

And he landed on Halodi Robotics to build humanoid robots leading towards solving AGI in the next ~20 years.

Eric’s article sums up his journey and the different pros and cons of joining some of the leading AI companies like Facebook (Meta), Google, Tesla, Open AI and more.

One of my favourite quotes from the article is that eventually, every (tech) company will lead to becoming an AGI company.

Meaning, that eventually all of the roads for technology companies lead to AGI (Rome).

all-roads-rome

All roads for large technology companies lead to some form of AGI (whatever that is). Source: All Roads Lead to Rome: The Machine Learning Job Market in 2022 by Eric Jang.

I also liked his point that if you become CEO of a company (your own company), eventually you get taken off the tools, as in you spend more time managing company things rather than creating things.

I’ve seen friends move from engineering roles to managing roles and not be fans.

As for me?

Keep me on the tools.

2. Tweet Thread: The second order effects of large language models by Russell Kaplan

Large language models (LLMs) are starting to change how computing is done.

With models like GPT-3, LLMs are starting to be integrated into all different kinds of services.

Instead of pressing certain buttons on different apps to do things, in the future it’s likely you’ll use a natural-language based interface.

We’re starting to see examples like this now.

Such as searching your photos app for “photos of my dog last April”.

But this is only the beginning.

Russell Kaplan’s Tweet thread discusses how LLMs could be used as the new form of compute moat for large technology companies.

As in, you’ll use an LLM as an API to build things and then you’ll be at the mercy of large companies as to whether or not you can use that LLM.

But I see this as no different from what happens now.

Right now many people leverage the services of the internet to run their businesses.

Their services are provided by similar tech companies to the ones building the LLMs.

So alongside renting compute in the cloud to run your service, you could rent a LLM to generate information for your service.

Kaplan’s point on governments eventually needing to create their own supercomputer clusters as a form of national security is thought provoking too:

https://twitter.com/russelljkaplan/status/1513128014879490050?s=20&t=6oFSQF-Gm8Qk2YTsI1K-1A

As for LLM use cases, imagine being able to describe to Google Slides or Keynote what kind of diagram you’d like, “please create a diagram with two arrows on the left and then make those arrows move around a circle in the middle...”

Or perhaps I just say “put together a list of 10 of the best resources in machine learning from the past month”... and include a “little spicy comment” about each.

3. Optimize your PyTorch code for speed and memory efficiency with these 18 tips by Jack Lin

I’ve been creating a new ZTM PyTorch course for the last couple of months.

And I’ve found a bunch of resources for improving PyTorch code.

Jack’s article with tips and tricks is fantastic.

From loading data to model architectures to data operations to inference and validation, it covers almost everything you’ll need.

Some of my favourites include:

Set num_workers in your DataLoader to a positive integer.
Set pin_memory=True in your DataLoader to save time transferring memory from CPU to GPU.
Use the @autocast decorator on your forward() to use automatic mixed-precision (this uses lower precision computing where possible to save time on operations).

For plenty more, see Jack’s article.

4. Pix2Seq: A New Language Interface for Object Detection by Google AI

In the theme of large language models, what if every problem started to look like a language problem?

Pix2Seq is a new paradigm by Google AI to treat Object Detection as a sequence problem.

It does so by treating the labels (bounding box coordinates) for an image as a sequence of tokens, much like a language model might treat a sentence.

For example, a label for an image might look like:

[y_min, x_min, y_max, x_max, class_label]

Pix2Seq would see this sequence as “describing” the object of interest.

pix2seq

The Pix2Seq workflow is to perceive an image and then generated a sequence of tokens (bounding box coordinates) for each object in the image. Source: Pix2Seq: A New Language Interface for Object Detection

This exciting new approach to object detection yields competitive or better results than previous methods using more complicated approaches.

5. Resources to improve your Python and Research Software Engineering Skills

You can never have enough Python skills.

And I’ve loved checking out the following resources for a bunch of tips and tricks:

Calmcode.io — short, easy to digest Python-based tutorials on everything from where to use args and kwargs as well as using FastAPI to build web applications with Python. I plan on going through a fair few of these over the next couple of months.
Research Software Engineering with Python by Alan Turing Institute — This free online book goes from the ground up teaching Python to how to use version control to how to build full-blown reproducible research projects.

6. DALLE•2 creates crazy wild images + an explanation + an open source version

DALLE•2 was recently released by OpenAI (the blog post/paper but not the model/weights).

It uses a combination of diffusion (learning to return an image to its original form after its being corrupted with noise) as well as CLIP (a combination of image and language pairs) to do some wild things with images.

It can generate images given a text-prompt with an astonishing level of detail:

dalle2-creation

Using a text based caption to create an image, mixing old with new... I don’t think many 16th century paintings have cell phones in them. Source: Eric Jang + DALL•E.

But that’s not all...

It can edit images based on language prompts.

And it can create new versions of the same image, such as viewing a painting in several different styles.

This Tweet from David Schnurr shows some more DALLE•2 creations

To learn more about DALLE•2, I’d recommend the following:

DALLE•2 blog post by OpenAI
How DALLE•2 works blog post by Aditya Ramesh
Open-source PyTorch implementation of DALLE•2 on GitHub
Although outlandishly impressive, DALLE•2 does have its limitations, such as counting past 4, see Benjamin Hilton’s thread on Twitter for examples

7. Meta AI’s self-supervised learning demo for images

Self-supervised learning (SSL) allows a model to learn a representation of data directly from the data itself.

One of the main methods of doing this is by masking out portions of the data and getting a model to fill in the blanks.

Another method is by using a teacher and student model, known as self-distillation or knowledge distillation. In this setup, a teacher model learns one representation of the image and the student tries to deconstruct the representation.

Meta AI has created a self-supervised learning demo for images to explore different self-supervised concepts.

On the demo, you can upload an image and have a self-supervised learning model DINO (self-distillation with no labels) segment the image or retrieve similar patches of the image with other images or match the image to images that are similar to it across a dataset of 5 million images.

ssl-demo

Example of using Meta AI’s SSL demo to match an image to other images with similar scenarios happening. Source: Custom image on Meta AI’s SSL demo page.

8. Solving ImageNet with a Unified Training Scheme

Many tricks have come out in the past few years to train computer vision models to state-of-the-art (SOTA) results.

Methods such as cutmix, mixup, learning rate decay, weight decay and many more have contributed to improved results over time.

But often with every new model came a new way specific way of training that model.

Having a unique training method for every new model is as tedious as it sounds.

But a new Unified Training Scheme for ImageNet (USI) from the Alibaba research group shows a way to produce SOTA results across a wide range of model architectures (transformers, CNNs, mobile-based architectures) using the same training methodology.

USI uses a knowledge distillation (KD) layer via a student and teacher method with several well-known computer vision training techniques.

The KD layer includes the prediction probabilities from a teacher model as well as the ground truth labels from the data as input to a student model.

This combination leads to more robust learning from the student model than just pure ground truth labels.

This method is not too dissimilar to some of the training methods recently seen in SOTA self-supervised learning architectures.

usi-training-method

The universal training scheme for ImageNet leverages knowledge distillation (KD) by combining a teacher and student network to form the final predictions. Source: Solving ImageNet: a Unified Scheme for Training any Backbone to Top Results paper.

Having a universal training scheme allows many researchers to get started with a SOTA training method and know they’re getting excellent results out of the box.

9. Why we should be using synthetic data in (robust) machine learning by Vikash Sehwag

With the increase in generative models comes the increased chance of generating synthetic data to train new and existing models.

But should we use generated (synthetic) data to train new and existing models?

Vikash Sehwag’s latest article argues yes we should because it can (if the generated is of high enough quality):

Improve accuracy and performance.
Improve a model’s robustness to adversarial samples (hard data samples/data samples trying to trick the model).

10. From the archives: The Bitter Lesson by Rich Sutton

I’m rereading the classic post The Bitter Lesson by Rich Sutton (first mentioned in Machine Learning Monthly November 2020).

It’s a good reminder with the rise of all the new kinds of models coming out.

The bitter lesson being: in the past 70 years of AI research general methods that leverage computation are ultimately the most effective and by a large margin.

Meaning many hand-crafted methods for AI over the past few decades have been superseded by methods that leverage randomness the best.

And the ones that leverage randomness the best are usually the ones that leverage compute power the best (because the more compute power you have, the more randomness you can simulate).

This is where methods like self-supervised learning (learning from the data instead of labels) and knowledge distillation (again learning more from the data as well as labels) come into play.

My takeaway from this is: keep your learning method as general as possible and learn from the data itself (a compressed version of reality) as much as possible.

The more you try to handcraft specific things, the more bitter things will taste when a more general method comes along.

See you next month!

What a massive month for the ML world in April!

As always, let me know if there's anything you think should be included in a future post.

In the meantime, keep learning, keep creating, keep dancing.

See you next month, Daniel

www.mrdbourke.com | YouTube

By the way, I'm a full-time instructor with Zero To Mastery Academy teaching people Machine Learning in the most efficient way possible. You can see a couple of our courses below or check out all Zero To Mastery courses.