34th issue! If you missed them, you can read the previous issues of the Machine Learning Monthly newsletter here.
Hey everyone!
Daniel here, Iβm a machine learning engineer who teaches the following beginner-friendly machine learning courses:
I also write regularly about machine learning on my own blog as well as make videos on the topic on YouTube.
Since there's a lot going on, the utmost care has been taken to keep things to the point.
Enough about me!
You're here for this month's Machine Learning Monthly Newsletter. Typically a 500ish (+/-1,000ish, usually +) word post detailing some of the most interesting things on machine learning I've found in the last month.
The Zero to Mastery PyTorch for Deep Learning course is 100% live! β 300+ videos and 48+ hours across 10 sections from fundamental to real-world model deployment.
If youβre looking for the most beginner-friendly way to learn PyTorch and deep learning on the internet, the Zero to Mastery PyTorch course is for you.
Thank you to Apoorv for this kind review π:
Stable Diffusion, or text-to-image generation models have been taking over the internet lately.
And one of the internetβs best machine learning concept explainers, Jay Alammar, is back with a fantastic article explaining how they work!
A visual outline of the Stable Diffusion architecture. Jayβs blog post is full of colorful descriptive images like this.
Machine learning algorithms can achieve excellent results when theyβre paired with good data.
And when Stitch Fix tried to build a custom style-recommendation system called Freestyle, their data was okay but it wasnβt as good as it could be.
So they created a system to pair machine learning with style experts to make the results the algorithms were giving continually improved.
For bad outfit recommendations, the experts would remove them from the dataset.
For good outfit recommendations, the experts would keep them in the dataset.
Repeat this process enough and you get a terrific and custom style recommender.
Much of the recent development in Stable Diffusion models has been powered by the open-source LAION5B dataset (5 billion image and text pairs).
And now theyβve added another dataset to their list of open-source contributions.
LAION-COCO is a dataset of 600M images with machine-generated captions based on several different models.
In many cases, the model generates captions that are perhaps more descriptive than the original. Source: LAION AI blog.
Based on testing, the generated captions rated similar in quality but had a slightly larger standard deviation than human-generated captions.
One of my favourite AI companies, Comma AI, creators of installable self-driving car systems released a blog post discussing their code design philosophy.
The main points being:
Over the space of 30 months, their codebase hasnβt increased in size yet theyβve added support for plenty of new features.
This is inspiring to me because if thereβs one thing Iβve learned itβs that it may be tempting to continue adding new code but that new code is something youβll have to maintain in the future.
Perhaps being a junior developer is solving a problem by writing more code and a senior developer is solving a problem by deleting more code.
With answers from 23,997 people from 173 countries around the world, Kaggle shares the results from their State of Data Science and Machine Learning questionnaire.
A couple of my most notable takeaways:
Results from the Kaggle 2022 State of Data Science and Machine Learning survey. PyTorch continues to grow whereas frameworks such as TensorFlow, Keras and Xgboost remain stable/see a slight drop.
The annual State of AI Report for 2022 has just been posted.
Packed with updates from whatβs been happening in the field over the past year across research, industry, politics, safety and predictions for whatβs to come.
Roberto Rocha explores how to take an unformatted block of text and turn it into tabular text with various prompts to GPT-3 (a large language model).
At this point, what canβt you do with language models?
I remember having a similar record-entry job as a teenager. Taking building files and adding them to an Excel spreadsheet.
I wouldβve much rathered developing prompts than manually entering things.
Language model prompts feel like magic.
And if it gets it wrong?
Just create another prompt to fix the mistakesβ¦
Researchers and engineers from Google and TensorFlow have created a model capable of sorting out different kinds of plastic waste (and many other kinds of waste) in Material Recovery Facilities (MRFs or recycling plants).
Currently, much of the waste that goes through MRFs gets sorted manually.
But if machine learning can help (which Iβm sure it can) to reduce the burden, then thatβs a win for all!
A segmentation model recognizing different types of waste on a conveyer belt. Source: TensorFlow Blog.
Googleβs new Interview Warmup tool will listen to your responses to various job interview-type questions and then transcribe them and provide insights on what you said, such as:
Example of a question being asked by Googleβs Interview Warmup and the answer being transcribed and then different insights being available. Source: Google Blog.
The seventh iteration of the public Open Images dataset has been released.
This time point annotations have been added across a wide variety of classes (38.6M new point annotations covering 5.8k classes over 1.4M images).
The point annotations are added by asking whether a particular point in an image is on something or not.
Example of the labelling interface for point annotations. It took annotators an average of 1.1 seconds per image to label these meaning the updated dataset contains about 2-years worth of new annotations. Source: Google AI blog.
These new point-based labels are much faster than other annotations such as segmentation. And it turns out these sparse point labels can be used for segmentation models and still achieve comparable quality to training on full labels.
It was only a matter of time before language models came to audio.
And Google have done it with AudioLM.
I played some of the generated music examples to my girlfriend, and she couldnβt tell the difference between the AI or the human.
And sheβs a skillful musician.
The only reason I could tell the speakers from AI and non-AI was because I was looking at the demo website when playing them.
Check out the AudioLM research website for a bunch of cool demos.
Crazy to think where this will be in a year or two.
Imagine being able to change Siri to being any voice you wantβ¦
Paper: Can synthetic data increase the performance of your classifier?
There has been a huge increase in generative models over the past 12 months. Especially in the image domain. This paper explores whether adding generated images to an existing real-world dataset can significantly improve the performance of existing classifier models (spoiler: yes).
My favourite things from Tesla AI Day
Last monthβs Machine Learning Monthly featured Teslaβs AI Day 2022 video (one of my favourite days of the year!). I watched the full thing and here are a couple of my favourite takeaways:
Teslaβs Data Engine overview. From a fleet of cars to a massive dataset to a model and then back again. Source: Tesla AI Day video 1:53:33.
Teslaβs Language of Lanes turns different lanes and direction options into a language problem. Source: Tesla AI Day video at 1:26:03.
[Video/Podcast] Andrej Karpathy and Lex Friedman
Andrej Karpathy (former AI lead at Tesla) recently went on the Lex Friedman podcast and talked about everything from synthetic biology to his role growing the Autopilot team at Tesla (from 0 to 1,000+ employees in 5-years).
I especially liked the conversation on Teslaβs data engine at 1:23:46. It ties in well with Teslaβs AI Day video.
Riley Goodsideβs Twitter account
Known as the GPT-3 whisperer, Rileyβs Twitter account is one of my favourite places to see just how intricate and detailed you can get with language models.
He regularly posts Tweets about crazy prompts and methods to get GPT-3 to do incredible things. And also a great example of how just posting what youβre interested in can lead to new opportunities. Riley just quit his job to start something related to the work heβs been sharing on Twitter.
What a massive month for the ML world in October!
As always, let me know if there's anything you think should be included in a future post.
In the meantime, keep learning, keep creating, keep dancing.
See you next month, Daniel
By the way, I'm a full-time instructor with Zero To Mastery Academy teaching people Machine Learning in the most efficient way possible. You can see a few of our courses below or check out all Zero To Mastery courses.