33rd issue! If you missed them, you can read the previous issues of the Machine Learning Monthly newsletter here.
Hey everyone!
Daniel here, Iām a machine learning engineer who teaches the following beginner-friendly machine learning courses:
I also write regularly about machine learning on my own blog as well as make videos on the topic on YouTube.
Since there's a lot going on, the utmost care has been taken to keep things to the point.
Enough about me!
You're here for this month's Machine Learning Monthly Newsletter. Typically a 500ish (+/-1,000ish, usually +) word post detailing some of the most interesting things on machine learning I've found in the last month.
The Zero to Mastery PyTorch for Deep Learning course is 100% live!
Two huge sections with 100+ total videos were added in the last month:
While the free tier of Google Colab stays unchanged, you can now upgrade your paid subscription to add even more compute opportunities (new and faster GPUs) on a pay-per-use basis.
This offer even more access to NVIDIA GPUs with a few clicks, plenty of opportunities for experimenting, experimenting, experimenting!
The recent Zero to Mastery PyTorch course helps beginners learn the fundamentals of PyTorch.
But once youāve learned the fundamentals, how do you improve your models even further?
A recent article on the PyTorch blog shares how Meta (Facebook) finds bottlenecks in their PyTorch models and then offers tips for how to improve each one.
Four optimization steps for PyTorch models found by the Meta team. Source: PyTorch blog.
Laion AI has released the best-performing open-source CLIP model with a blog post detailing how they made it happen.
CLIP stands for ācontrastive language-image pretrainingā which means the model is capable of many vision and language tasks such as matching text with images or images with images.
It also makes it possible to perform zero-shot image classification based on how similar a piece of text is to an image.
For example, you could create a ācatā, ādogā and āchickenā image classification model, despite not having any labelled images of ācatā, ādogā or āchickenā.
Check out the blog post for training details and tips as well as the OpenCLIP GitHub for code and model weights.
Many of todayās modern computer vision models are pretrained on large open-source datasets such as ImageNet-1K (1+ million images, 1000 classes) and ImageNet-21K (13+ million images, 21,000 classes).
But large datasets like this are often created in crowd-sourcing fashion and often contain mistakes (mismatched labels) and plenty of duplicates (according to new research, ImageNet-21K has 1+ million duplicate images).
However, thereās now an open-source tool called fastdup to compute image statistics (such as brightness, darkness, sharpness, blurriest, smallest, largest, unique colours and more), find duplicates (including from slightly different points of view), find corrupted and broken images, detect outliers (images that donāt match the distribution of the rest), find wrong labels and much more.
See more on the fastdup GitHub page as well the video tutorial.
Discovering different image statistics on the Food101 dataset (used in the Zero to Mastery PyTorch course) with fastdup. Source: fastdup GitHub.
Salesforce has open-sourced a library called LAVIS (LAnguage-and-VISion intelligence) to make vision-language research more reproducible.
Inside youāll find a Pythonic API for 10+ vision and language tasks such as image retrieval, image captioning, visual question answering and more.
Youāll also have access to 20+ vision and language datasets and 30+ pretrained state-of-the-art vision-language models such as BLIP and CLIP.
See more on the LAVIS GitHub page.
I love this!
Itās all about the embeddings!
Good embeddings (numerical representations of data) generally lead to good results.
And Towhee helps you turn almost any type of unstructured data (images, audio, text, 3D molecular structures) into embeddings (also called feature vectors).
My favourite is how quick you can get access to Towheeās 700+ pretrained models, for example, check out this code snippet for creating a text-to-image search:
Installing and creating a text-to-image vector search with Towhee in ~10-lines of code. Source: Towhee GitHub.
A fantastic breakdown of whether you need MLOps or not (chances are youāre overthinking it).
Lakās main argument for building machine learning applications: KISS (keep it simple stupid).
This is fairly new but itās exciting.
Imagine being able to go to a machine learning repo on GitHub and pressing ā.ā and having a fully interactive, GPU-powered Jupyter server running right within the browser.
Thatās whatās already possible with GitHub Codespaces on many different repos but the machine learning use-case hasnāt quite been there (until now).
If it keeps going how it is, GitHub Codespaces could be a fantastic Google Colab alternative, coding with all the machine learning requirements (a GPU, a Jupyter notebook) right with a GitHub.
Iām excited to try it out in the next couple of months!
Iām a big fan of the idea of coding in the browser.
The less setup on different local machines the better.
In an ideal world, Iād go to any GitHub repo, press a button and start coding start away.
GitHub Codespaces is working towards making this happen (see the above link).
And so is Replit.
Especially with their new GhostWriter mode which is their equivalent of GitHub Copilot but faster (according to them).
Iāve really liked using GitHub Copilot lately for helping me learn web development.
But GhostWriter looks incredible too and itās great to have more competitors in the space.
I really liked the blog post by Replit announcing how they created the model (and the challenges that came with deployment) as well as how they integrated it with their online app (as much of a challenge as training the model).
Different optimization techniques used by the Replit team to make their GhostWriter model available for pair-programming applications with an average response time of 400ms. Source: Replit blog.
My favourite quote from the release article (bold mine):
What do you do when you'reĀ not a multi-trillion multi-national corporation (yet) with tons of ML research scientists, infinite budget for training, billions in industry partnerships, and store most of the world's code but still want to bring state-of-the-art AI to production? You start from open-source!
A fantastic insight into why the rise of large language models (LLMs) such as GPT-3 and their integration into almost every kind of application is going to be the next phase of computing.
Always bet on text.
Netflix has long been one of the most open companies on how it uses machine learning to drive its business.
And now theyāve started a new blog series discussing how they use machine learning not only to curate media but to create media.
From using computer vision for video understanding and editing to visual effects and computer graphics to generate media and digitize actors/props and sets.
Teslaās AI day 2022 is live!
Packed with AI-first updates on how theyāre building the worldās largest fleet of self-driving cars as well as using the same technology to create the Tesla bot.
No exact spoilers from this one as itās only a day or two old and I havenāt watched all of yet (Iām currently on vacation in Europe).
But if this comment on the Learn PyTorch in a day video reveals anything, itās that the Zero to Mastery PyTorch course lines up pretty well with Teslaās philosophy:
Comment from Learn PyTorch in day. Literally. YouTube video.
Iāll include my favourites from AI day in next monthās Machine Learning Monthly!
What a massive month for the ML world in September!
As always, let me know if there's anything you think should be included in a future post.
In the meantime, keep learning, keep creating, keep dancing.
See you next month, Daniel
By the way, I'm a full-time instructor with Zero To Mastery Academy teaching people Machine Learning in the most efficient way possible. You can see a few of our courses below or check out all Zero To Mastery courses.