34th issue! If you missed them, you can read the previous issues of the Machine Learning Monthly newsletter here.
Hey everyone!
Daniel here, I’m a machine learning engineer who teaches the following beginner-friendly machine learning courses:
I also write regularly about machine learning on my own blog as well as make videos on the topic on YouTube.
Since there's a lot going on, the utmost care has been taken to keep things to the point.
Enough about me!
You're here for this month's Machine Learning Monthly Newsletter. Typically a 500ish (+/-1,000ish, usually +) word post detailing some of the most interesting things on machine learning I've found in the last month.
The Zero to Mastery PyTorch for Deep Learning course is 100% live! — 300+ videos and 48+ hours across 10 sections from fundamental to real-world model deployment.
If you’re looking for the most beginner-friendly way to learn PyTorch and deep learning on the internet, the Zero to Mastery PyTorch course is for you.
Thank you to Apoorv for this kind review 🙏:
Stable Diffusion, or text-to-image generation models have been taking over the internet lately.
And one of the internet’s best machine learning concept explainers, Jay Alammar, is back with a fantastic article explaining how they work!
A visual outline of the Stable Diffusion architecture. Jay’s blog post is full of colorful descriptive images like this.
Machine learning algorithms can achieve excellent results when they’re paired with good data.
And when Stitch Fix tried to build a custom style-recommendation system called Freestyle, their data was okay but it wasn’t as good as it could be.
So they created a system to pair machine learning with style experts to make the results the algorithms were giving continually improved.
For bad outfit recommendations, the experts would remove them from the dataset.
For good outfit recommendations, the experts would keep them in the dataset.
Repeat this process enough and you get a terrific and custom style recommender.
Much of the recent development in Stable Diffusion models has been powered by the open-source LAION5B dataset (5 billion image and text pairs).
And now they’ve added another dataset to their list of open-source contributions.
LAION-COCO is a dataset of 600M images with machine-generated captions based on several different models.
In many cases, the model generates captions that are perhaps more descriptive than the original. Source: LAION AI blog.
Based on testing, the generated captions rated similar in quality but had a slightly larger standard deviation than human-generated captions.
One of my favourite AI companies, Comma AI, creators of installable self-driving car systems released a blog post discussing their code design philosophy.
The main points being:
Over the space of 30 months, their codebase hasn’t increased in size yet they’ve added support for plenty of new features.
This is inspiring to me because if there’s one thing I’ve learned it’s that it may be tempting to continue adding new code but that new code is something you’ll have to maintain in the future.
Perhaps being a junior developer is solving a problem by writing more code and a senior developer is solving a problem by deleting more code.
With answers from 23,997 people from 173 countries around the world, Kaggle shares the results from their State of Data Science and Machine Learning questionnaire.
A couple of my most notable takeaways:
Results from the Kaggle 2022 State of Data Science and Machine Learning survey. PyTorch continues to grow whereas frameworks such as TensorFlow, Keras and Xgboost remain stable/see a slight drop.
The annual State of AI Report for 2022 has just been posted.
Packed with updates from what’s been happening in the field over the past year across research, industry, politics, safety and predictions for what’s to come.
Roberto Rocha explores how to take an unformatted block of text and turn it into tabular text with various prompts to GPT-3 (a large language model).
At this point, what can’t you do with language models?
I remember having a similar record-entry job as a teenager. Taking building files and adding them to an Excel spreadsheet.
I would’ve much rathered developing prompts than manually entering things.
Language model prompts feel like magic.
And if it gets it wrong?
Just create another prompt to fix the mistakes…
Researchers and engineers from Google and TensorFlow have created a model capable of sorting out different kinds of plastic waste (and many other kinds of waste) in Material Recovery Facilities (MRFs or recycling plants).
Currently, much of the waste that goes through MRFs gets sorted manually.
But if machine learning can help (which I’m sure it can) to reduce the burden, then that’s a win for all!
A segmentation model recognizing different types of waste on a conveyer belt. Source: TensorFlow Blog.
Google’s new Interview Warmup tool will listen to your responses to various job interview-type questions and then transcribe them and provide insights on what you said, such as:
Example of a question being asked by Google’s Interview Warmup and the answer being transcribed and then different insights being available. Source: Google Blog.
The seventh iteration of the public Open Images dataset has been released.
This time point annotations have been added across a wide variety of classes (38.6M new point annotations covering 5.8k classes over 1.4M images).
The point annotations are added by asking whether a particular point in an image is on something or not.
Example of the labelling interface for point annotations. It took annotators an average of 1.1 seconds per image to label these meaning the updated dataset contains about 2-years worth of new annotations. Source: Google AI blog.
These new point-based labels are much faster than other annotations such as segmentation. And it turns out these sparse point labels can be used for segmentation models and still achieve comparable quality to training on full labels.
It was only a matter of time before language models came to audio.
And Google have done it with AudioLM.
I played some of the generated music examples to my girlfriend, and she couldn’t tell the difference between the AI or the human.
And she’s a skillful musician.
The only reason I could tell the speakers from AI and non-AI was because I was looking at the demo website when playing them.
Check out the AudioLM research website for a bunch of cool demos.
Crazy to think where this will be in a year or two.
Imagine being able to change Siri to being any voice you want…
Paper: Can synthetic data increase the performance of your classifier?
There has been a huge increase in generative models over the past 12 months. Especially in the image domain. This paper explores whether adding generated images to an existing real-world dataset can significantly improve the performance of existing classifier models (spoiler: yes).
My favourite things from Tesla AI Day
Last month’s Machine Learning Monthly featured Tesla’s AI Day 2022 video (one of my favourite days of the year!). I watched the full thing and here are a couple of my favourite takeaways:
Tesla’s Data Engine overview. From a fleet of cars to a massive dataset to a model and then back again. Source: Tesla AI Day video 1:53:33.
Tesla’s Language of Lanes turns different lanes and direction options into a language problem. Source: Tesla AI Day video at 1:26:03.
[Video/Podcast] Andrej Karpathy and Lex Friedman
Andrej Karpathy (former AI lead at Tesla) recently went on the Lex Friedman podcast and talked about everything from synthetic biology to his role growing the Autopilot team at Tesla (from 0 to 1,000+ employees in 5-years).
I especially liked the conversation on Tesla’s data engine at 1:23:46. It ties in well with Tesla’s AI Day video.
Riley Goodside’s Twitter account
Known as the GPT-3 whisperer, Riley’s Twitter account is one of my favourite places to see just how intricate and detailed you can get with language models.
He regularly posts Tweets about crazy prompts and methods to get GPT-3 to do incredible things. And also a great example of how just posting what you’re interested in can lead to new opportunities. Riley just quit his job to start something related to the work he’s been sharing on Twitter.
What a massive month for the ML world in October!
As always, let me know if there's anything you think should be included in a future post.
In the meantime, keep learning, keep creating, keep dancing.
See you next month, Daniel
By the way, I'm a full-time instructor with Zero To Mastery Academy teaching people Machine Learning in the most efficient way possible. You can see a few of our courses below or check out all Zero To Mastery courses.