[October 2022] Machine Learning Monthly Newsletter 💻🤖

34th issue! If you missed them, you can read the previous issues of the Machine Learning Monthly newsletter here.

Hey everyone!

Daniel here, I’m a machine learning engineer who teaches the following beginner-friendly machine learning courses:

Complete Machine Learning and Data Science Bootcamp: Zero to Mastery
Get TensorFlow Developer Certified: Zero to Mastery
[NEW] PyTorch for Deep Learning: Zero to Mastery (100+ videos added since last month!)

I also write regularly about machine learning on my own blog as well as make videos on the topic on YouTube.

Since there's a lot going on, the utmost care has been taken to keep things to the point.

Enough about me!

You're here for this month's Machine Learning Monthly Newsletter. Typically a 500ish (+/-1,000ish, usually +) word post detailing some of the most interesting things on machine learning I've found in the last month.

What you missed in October as a Machine Learning Engineer…

My work 👇

The Zero to Mastery PyTorch for Deep Learning course is 100% live! — 300+ videos and 48+ hours across 10 sections from fundamental to real-world model deployment.

If you’re looking for the most beginner-friendly way to learn PyTorch and deep learning on the internet, the Zero to Mastery PyTorch course is for you.

Thank you to Apoorv for this kind review 🙏:

apoorv-review-for-pytorch-course

From the Internet

1. The Illustrated Stable Diffusion by Jay Alammar

Stable Diffusion, or text-to-image generation models have been taking over the internet lately.

And one of the internet’s best machine learning concept explainers, Jay Alammar, is back with a fantastic article explaining how they work!

stable-diffusion-diffusion-process

A visual outline of the Stable Diffusion architecture. Jay’s blog post is full of colorful descriptive images like this.

2. Stitch Fix shares details on how they build a machine learning system with humans-in-the-loop

Machine learning algorithms can achieve excellent results when they’re paired with good data.

And when Stitch Fix tried to build a custom style-recommendation system called Freestyle, their data was okay but it wasn’t as good as it could be.

So they created a system to pair machine learning with style experts to make the results the algorithms were giving continually improved.

For bad outfit recommendations, the experts would remove them from the dataset.

For good outfit recommendations, the experts would keep them in the dataset.

Repeat this process enough and you get a terrific and custom style recommender.

3. LAION AI release a dataset of 600M synthetic image captions

Much of the recent development in Stable Diffusion models has been powered by the open-source LAION5B dataset (5 billion image and text pairs).

And now they’ve added another dataset to their list of open-source contributions.

LAION-COCO is a dataset of 600M images with machine-generated captions based on several different models.

laion-ai-600m-coco-generated-captions

In many cases, the model generates captions that are perhaps more descriptive than the original. Source: LAION AI blog.

Based on testing, the generated captions rated similar in quality but had a slightly larger standard deviation than human-generated captions.

4. Development speed over everything by Comma AI

One of my favourite AI companies, Comma AI, creators of installable self-driving car systems released a blog post discussing their code design philosophy.

The main points being:

Simplicity (keep the code as concise as possible) and...
Speed (you should be able to experiment fast)

Over the space of 30 months, their codebase hasn’t increased in size yet they’ve added support for plenty of new features.

This is inspiring to me because if there’s one thing I’ve learned it’s that it may be tempting to continue adding new code but that new code is something you’ll have to maintain in the future.

Perhaps being a junior developer is solving a problem by writing more code and a senior developer is solving a problem by deleting more code.

5. Kaggle’s State of Data Science and Machine Learning 2022

With answers from 23,997 people from 173 countries around the world, Kaggle shares the results from their State of Data Science and Machine Learning questionnaire.

A couple of my most notable takeaways:

Python and SQL are still the most common programming skills for data science
Scikit-Learn is the most popular ML framework while PyTorch has been growing steadily year-over-year (perfect timing for the new ZTM PyTorch course!)

kaggle-survey-pytorch-growing-scikit-learn-number-one

Results from the Kaggle 2022 State of Data Science and Machine Learning survey. PyTorch continues to grow whereas frameworks such as TensorFlow, Keras and Xgboost remain stable/see a slight drop.

6. State of AI Report 2022 is live!

The annual State of AI Report for 2022 has just been posted.

Packed with updates from what’s been happening in the field over the past year across research, industry, politics, safety and predictions for what’s to come.

7. Turning unstructured data into tabular data with GPT-3

Roberto Rocha explores how to take an unformatted block of text and turn it into tabular text with various prompts to GPT-3 (a large language model).

At this point, what can’t you do with language models?

I remember having a similar record-entry job as a teenager. Taking building files and adding them to an Excel spreadsheet.

I would’ve much rathered developing prompts than manually entering things.

Language model prompts feel like magic.

And if it gets it wrong?

Just create another prompt to fix the mistakes…

8. Reducing waste with Machine Learning

Researchers and engineers from Google and TensorFlow have created a model capable of sorting out different kinds of plastic waste (and many other kinds of waste) in Material Recovery Facilities (MRFs or recycling plants).

Currently, much of the waste that goes through MRFs gets sorted manually.

But if machine learning can help (which I’m sure it can) to reduce the burden, then that’s a win for all!

plastic-waste-segmentation-model

A segmentation model recognizing different types of waste on a conveyer belt. Source: TensorFlow Blog.

9. Google Builds a Natural Language Assessment (NLA) model to help you prepare for job interviews

Google’s new Interview Warmup tool will listen to your responses to various job interview-type questions and then transcribe them and provide insights on what you said, such as:

Job-related terms — Words and terms you’ve said in your response that relate well to the job.
Most-used words — Words you most frequently said, for example, I often get caught out saying “okay” and “um” a few too many times.
Talking points — Things you’ve said that may be good talking points to continue on with.

interview-warmup

Example of a question being asked by Google’s Interview Warmup and the answer being transcribed and then different insights being available. Source: Google Blog.

10. Open Images V7 is out!

The seventh iteration of the public Open Images dataset has been released.

This time point annotations have been added across a wide variety of classes (38.6M new point annotations covering 5.8k classes over 1.4M images).

The point annotations are added by asking whether a particular point in an image is on something or not.

point-labelling-in-open-images-v7

Example of the labelling interface for point annotations. It took annotators an average of 1.1 seconds per image to label these meaning the updated dataset contains about 2-years worth of new annotations. Source: Google AI blog.

These new point-based labels are much faster than other annotations such as segmentation. And it turns out these sparse point labels can be used for segmentation models and still achieve comparable quality to training on full labels.

11. Language Models come to Audio with AudioLM

It was only a matter of time before language models came to audio.

And Google have done it with AudioLM.

I played some of the generated music examples to my girlfriend, and she couldn’t tell the difference between the AI or the human.

And she’s a skillful musician.

The only reason I could tell the speakers from AI and non-AI was because I was looking at the demo website when playing them.

Check out the AudioLM research website for a bunch of cool demos.

Crazy to think where this will be in a year or two.

Imagine being able to change Siri to being any voice you want…

Extras 🎁

Paper: Can synthetic data increase the performance of your classifier?

There has been a huge increase in generative models over the past 12 months. Especially in the image domain. This paper explores whether adding generated images to an existing real-world dataset can significantly improve the performance of existing classifier models (spoiler: yes).

My favourite things from Tesla AI Day

Last month’s Machine Learning Monthly featured Tesla’s AI Day 2022 video (one of my favourite days of the year!). I watched the full thing and here are a couple of my favourite takeaways:

Tesla’s Data Engine — How do you teach a car to drive like a human? Collect examples of driving, label the data, train a model, find errors in the fleet of cars, fix the errors and repeat x1000000 (I made this number up). A data engine should be the goal of any real-world AI system.

Tesla’s Data Engine overview. From a fleet of cars to a massive dataset to a model and then back again. Source: Tesla AI Day video 1:53:33.

Tesla’s Language of Lanes — How do you label a complex road intersection with all the possible outcomes and directions to go? One way would be to go by brute force, the other might be to turn lanes into a language, as in each possible direction turns out to be a negotiation between the car you’re in and every other car on the road. This seems like an outstanding idea but George Hotz (founder of Comma AI) seemed to think it was introducing far more complexity to the problem than needed.

Tesla’s Language of Lanes turns different lanes and direction options into a language problem. Source: Tesla AI Day video at 1:26:03.

[Video/Podcast] Andrej Karpathy and Lex Friedman

Andrej Karpathy (former AI lead at Tesla) recently went on the Lex Friedman podcast and talked about everything from synthetic biology to his role growing the Autopilot team at Tesla (from 0 to 1,000+ employees in 5-years).

I especially liked the conversation on Tesla’s data engine at 1:23:46. It ties in well with Tesla’s AI Day video.

Riley Goodside’s Twitter account

Known as the GPT-3 whisperer, Riley’s Twitter account is one of my favourite places to see just how intricate and detailed you can get with language models.

He regularly posts Tweets about crazy prompts and methods to get GPT-3 to do incredible things. And also a great example of how just posting what you’re interested in can lead to new opportunities. Riley just quit his job to start something related to the work he’s been sharing on Twitter.

See you next month!

What a massive month for the ML world in October!

As always, let me know if there's anything you think should be included in a future post.

In the meantime, keep learning, keep creating, keep dancing.

See you next month, Daniel

www.mrdbourke.com | YouTube

By the way, I'm a full-time instructor with Zero To Mastery Academy teaching people Machine Learning in the most efficient way possible. You can see a few of our courses below or check out all Zero To Mastery courses.