33rd issue! If you missed them, you can read the previous issues of the Machine Learning Monthly newsletter here.
Hey everyone!
Daniel here, Iβm a machine learning engineer who teaches the following beginner-friendly machine learning courses:
I also write regularly about machine learning on my own blog as well as make videos on the topic on YouTube.
Since there's a lot going on, the utmost care has been taken to keep things to the point.
Enough about me!
You're here for this month's Machine Learning Monthly Newsletter. Typically a 500ish (+/-1,000ish, usually +) word post detailing some of the most interesting things on machine learning I've found in the last month.
The Zero to Mastery PyTorch for Deep Learning course is 100% live!
Two huge sections with 100+ total videos were added in the last month:
While the free tier of Google Colab stays unchanged, you can now upgrade your paid subscription to add even more compute opportunities (new and faster GPUs) on a pay-per-use basis.
This offer even more access to NVIDIA GPUs with a few clicks, plenty of opportunities for experimenting, experimenting, experimenting!
The recent Zero to Mastery PyTorch course helps beginners learn the fundamentals of PyTorch.
But once youβve learned the fundamentals, how do you improve your models even further?
A recent article on the PyTorch blog shares how Meta (Facebook) finds bottlenecks in their PyTorch models and then offers tips for how to improve each one.
Four optimization steps for PyTorch models found by the Meta team. Source: PyTorch blog.
Laion AI has released the best-performing open-source CLIP model with a blog post detailing how they made it happen.
CLIP stands for βcontrastive language-image pretrainingβ which means the model is capable of many vision and language tasks such as matching text with images or images with images.
It also makes it possible to perform zero-shot image classification based on how similar a piece of text is to an image.
For example, you could create a βcatβ, βdogβ and βchickenβ image classification model, despite not having any labelled images of βcatβ, βdogβ or βchickenβ.
Check out the blog post for training details and tips as well as the OpenCLIP GitHub for code and model weights.
Many of todayβs modern computer vision models are pretrained on large open-source datasets such as ImageNet-1K (1+ million images, 1000 classes) and ImageNet-21K (13+ million images, 21,000 classes).
But large datasets like this are often created in crowd-sourcing fashion and often contain mistakes (mismatched labels) and plenty of duplicates (according to new research, ImageNet-21K has 1+ million duplicate images).
However, thereβs now an open-source tool called fastdup to compute image statistics (such as brightness, darkness, sharpness, blurriest, smallest, largest, unique colours and more), find duplicates (including from slightly different points of view), find corrupted and broken images, detect outliers (images that donβt match the distribution of the rest), find wrong labels and much more.
See more on the fastdup GitHub page as well the video tutorial.
Discovering different image statistics on the Food101 dataset (used in the Zero to Mastery PyTorch course) with fastdup. Source: fastdup GitHub.
Salesforce has open-sourced a library called LAVIS (LAnguage-and-VISion intelligence) to make vision-language research more reproducible.
Inside youβll find a Pythonic API for 10+ vision and language tasks such as image retrieval, image captioning, visual question answering and more.
Youβll also have access to 20+ vision and language datasets and 30+ pretrained state-of-the-art vision-language models such as BLIP and CLIP.
See more on the LAVIS GitHub page.
I love this!
Itβs all about the embeddings!
Good embeddings (numerical representations of data) generally lead to good results.
And Towhee helps you turn almost any type of unstructured data (images, audio, text, 3D molecular structures) into embeddings (also called feature vectors).
My favourite is how quick you can get access to Towheeβs 700+ pretrained models, for example, check out this code snippet for creating a text-to-image search:
Installing and creating a text-to-image vector search with Towhee in ~10-lines of code. Source: Towhee GitHub.
A fantastic breakdown of whether you need MLOps or not (chances are youβre overthinking it).
Lakβs main argument for building machine learning applications: KISS (keep it simple stupid).
This is fairly new but itβs exciting.
Imagine being able to go to a machine learning repo on GitHub and pressing β.β and having a fully interactive, GPU-powered Jupyter server running right within the browser.
Thatβs whatβs already possible with GitHub Codespaces on many different repos but the machine learning use-case hasnβt quite been there (until now).
If it keeps going how it is, GitHub Codespaces could be a fantastic Google Colab alternative, coding with all the machine learning requirements (a GPU, a Jupyter notebook) right with a GitHub.
Iβm excited to try it out in the next couple of months!
Iβm a big fan of the idea of coding in the browser.
The less setup on different local machines the better.
In an ideal world, Iβd go to any GitHub repo, press a button and start coding start away.
GitHub Codespaces is working towards making this happen (see the above link).
And so is Replit.
Especially with their new GhostWriter mode which is their equivalent of GitHub Copilot but faster (according to them).
Iβve really liked using GitHub Copilot lately for helping me learn web development.
But GhostWriter looks incredible too and itβs great to have more competitors in the space.
I really liked the blog post by Replit announcing how they created the model (and the challenges that came with deployment) as well as how they integrated it with their online app (as much of a challenge as training the model).
Different optimization techniques used by the Replit team to make their GhostWriter model available for pair-programming applications with an average response time of 400ms. Source: Replit blog.
My favourite quote from the release article (bold mine):
What do you do when you'reΒ not a multi-trillion multi-national corporation (yet) with tons of ML research scientists, infinite budget for training, billions in industry partnerships, and store most of the world's code but still want to bring state-of-the-art AI to production? You start from open-source!
A fantastic insight into why the rise of large language models (LLMs) such as GPT-3 and their integration into almost every kind of application is going to be the next phase of computing.
Always bet on text.
Netflix has long been one of the most open companies on how it uses machine learning to drive its business.
And now theyβve started a new blog series discussing how they use machine learning not only to curate media but to create media.
From using computer vision for video understanding and editing to visual effects and computer graphics to generate media and digitize actors/props and sets.
Teslaβs AI day 2022 is live!
Packed with AI-first updates on how theyβre building the worldβs largest fleet of self-driving cars as well as using the same technology to create the Tesla bot.
No exact spoilers from this one as itβs only a day or two old and I havenβt watched all of yet (Iβm currently on vacation in Europe).
But if this comment on the Learn PyTorch in a day video reveals anything, itβs that the Zero to Mastery PyTorch course lines up pretty well with Teslaβs philosophy:
Comment from Learn PyTorch in day. Literally. YouTube video.
Iβll include my favourites from AI day in next monthβs Machine Learning Monthly!
What a massive month for the ML world in September!
As always, let me know if there's anything you think should be included in a future post.
In the meantime, keep learning, keep creating, keep dancing.
See you next month, Daniel
By the way, I'm a full-time instructor with Zero To Mastery Academy teaching people Machine Learning in the most efficient way possible. You can see a few of our courses below or check out all Zero To Mastery courses.