36th issue! If you missed them, you can read the previous issues of the Machine Learning Monthly newsletter here.
Happy New Year everyone!
Daniel here, Iβm a machine learning engineer who teaches the following beginner-friendly machine learning courses:
I also write regularly about machine learning on my own blog as well as make videos on the topic on YouTube.
Since there's a lot going on, the utmost care has been taken to keep things to the point.
Enough about me!
You're here for this month's Machine Learning Monthly Newsletter. Typically a 500ish (+/-1,000ish, usually +) word post detailing some of the most interesting things on machine learning I've found in the last month.
[Blog Post] The Three Most Common Errors in PyTorch and How to Fix Them
Going through the new ZTM PyTorch course you might find that we run into some errors when coding together. Namely, these three errors are most often:
You can find an in-depth blog post I wrote about these errors here as well as the code on GitHub.
Multi-modal machine learning involves more than one type of modality.
For example, OpenAIβs CLIP (Contrastive Language-Image Pretraining) deals with images (vision) and language (text) and combines the two.
This means itβs capable of dealing with two modalities: vision and language.
James Briggsβ article walks through how you can use the open-source version of CLIP to create text and image embeddings that have a relationship to each other.
For example, you could create your own photo searching app capable of searching with natural language such as βsinging happy birthdayβ and because the image embeddings are crossed with text embeddings, the text search should return the most relevant images to the text βsinging happy birthdayβ.
INSTRUCTOR is an open-source and instruction-finetuned text embedding model that can create text embeddings specific to any task.
By passing an instruction to the model of how to embed a certain piece of text, as well as the target text, INSTRUCTOR will create embeddings of the given text based on the instruction.
For example, if you wanted to create classification embeddings, you could say, βRepresent the text for classification; Input: INPUT_TEXT_HEREβ and you would get back a set of embeddings of the input text finetuned for classification.
The same setup can be applied to different problem spaces such as classification, retrieval, clustering, similarity and more.
As well as different domains such as science, finance, business, medicine and more.
Overview of the process to create INSTRUCTOR, multiple tasks and instructions are fed as input, giving the INSTRUCTOR model capabilities of dealing with almost an unlimited number of NLP tasks. Source: INSTRUCTOR web page.
See the following for more:
Speaking of embeddings, OpenAIβs embedding API just got improved and is 99.8% cheaper.
For example, you could embed 100k pages of text for $40 USD.
The new embedding model (text-embedding-ada-002
) improves performance across several benchmarks (except for text-similarity-davinci-001
on text classification).
This makes it possible for world-class text-based apps to be created at a much cheaper price.
But it also raises the question of whether you should go paid vs. free (open-source).
As discussed above, there are now state-of-the-art embedding models such as INSTRUCTOR available on Hugging Face to use for free.
The benefit of using the paid service is that usually, the API is clean and easy to use.
However, the code for using a Hugging Face model is also clean and easy to use.
Best to experiment with which works better for your use case.
Bonus: If youβd like to see the kinds of things you can do with the OpenAI embeddings, the OpenAI Cookbook repository on GitHub is full of code examples.
In I Built an AI Chatbot Based On My Favourite Podcast, Dan Shipper discusses how he created a chatbot using text embeddings (the same OpenAI embeddings discussed above) of the transcripts of the Huberman Lab podcast.
He lists the principles of the chatbot as follows:
Answer the question as truthfully as possible using the provided context, and if the answer is not contained within the text below, say "I don't know."
[ relevant sections of Huberman Lab transcripts ]
Q: What is task bracketing?
A:
This is an epic example of the kind of apps of the future, a chatbot finetuned on a certain corpus that is able to give back all kinds of relevant information about a narrow topic.
As for a project idea, you could build the same thing but with your own favourite podcast. If you do, let me know, Iβd love to see it.
All the embeddings getting an upgrade this month!
With the release of EVA (Exploring the Limits of Masked Visual Representation Learning at Scale), a 1-billion parameter model for vision, as open-source, this means thereβs now a way to turn your images into state-of-the-art embeddings as well!
EVA achieves 89.7% accuracy on ImageNet and comes with several models:
Iβm already thinking of ideas of how to use the EVA (and EVA-CLIP) embeddings in creating Nutrify (take a photo of food and learn about it), perhaps by embedding the images and then using the images to improve/find weaknesses in the data.
See more about EVA at the following resources:
Iβve stumbled upon Tivadar Dankaβs incredible math-related Tweets a fair few times.
But I never knew there were blog posts to go along with them!
For example, his post on how the dot product measures similarity is a must read for anyone who wants to know why matrix multiplication uses the dot product (the dot product is also the main driver of the attention mechanism which is the main driver of the Transformer architecture).
Tivadar is also writing an upcoming book on the Mathematics of Machine Learning which you can find out more about on his website.
The dot product formula, one of the many beautiful images on Tivadarβs website.
With the rise of large language models, it makes sense to try and make them faster.
Especially on inference time.
After all, a large language model might take a week or month to train but then inferences will be performed for an unconceived amount of time after that.
If inference takes a long time, because of the scale it could be made at, that time could add up exponentially.
Google AIβs new CALM (Confident Adaptive Language Modeling) technique uses early exiting as a strategy to speed up language model inference time.
For example, if a model is confident on an output in earlier layers, it could send the output straight to the output layer rather than send it through all subsequent layers.
But if a model isnβt confident on an output, it could use more compute when necessary.
Like when someone asks you a question you know the answer to, such as, βhow old are you?β you donβt sit there evaluating all the options, you say β29β.
Whereas if someone asks you directions from where youβre standing to the local supermarket, you may have to think a bit more about which way to go so the answer becomes more elaborate.
Using the CALM technique enables similar quality of outputs with a ~2-3x improvement on inference time.
Seeing this makes me wonder if the same could be done for vision models.
The CALM technique demonstrated by the outputs exiting early for layers which are confident, skipping ahead saves computation time on the later layers. Source: Google AI blog.
Iβve been learning more and more about representing data as embeddings (as seen by the resources discussed above).
And one of the ways to store embedding representations is via a vector database.
A vector database is really just a normal database but specialized for embedding vectors.
For example, if you store your images as embeddings and you wanted to search those embeddings, what kind of search techniques could you use (e.g. similarity search, nearest neighbours, etc)?
Vector databases help you perform these kinds of searches.
Dmitry Kanβs article goes through the pros and cons of many of the most popular vector database offerings out there such as Pinecone, Weaviate, Milvus, Vespa, Vald and Qdrant.
At the start of 2022, Comma AI Tweeted that theyβd be driven to Taco Bell in a self-driving car powered by open source code (openpilot) in a stock production car using a comma three without disengagement.
The original Comma AI Tweet at the start of 2022 stating their goal. Source: commai_ai Twitter.
And they did it!!!
Seeing this performed by such a small team and watching the improvements happen little by little is nothing short of inspiring.
See more about the drive here:
A nice post on MLOps from the Outerbounds team recently, The 4 main tasks in the Production ML Lifecycle, MLOps and Data-centrics AI.
Inside they talk about a recent paper that reviewed many of the practices companies go through to create machine learning based services and compared them to how machine learning is typically taught (textbook style).
They broke it down into four main tasks:
They also talk about the three Vβs which are important parts of each step:
As I build my own full-stack project, Nutrify, Iβm thinking about how each of these work in.
What a massive month for the ML world in December!
I can't wait to see what's in store for 2023. Make sure to subscribe below to follow along each month.
And as always, let me know if there's anything you think should be included in a future post.
In the meantime, keep learning, keep creating, keep dancing.
See you next month,
Daniel
By the way, I'm a full-time instructor with Zero To Mastery Academy teaching people Machine Learning in the most efficient way possible. You can see a few of our courses below or check out all Zero To Mastery courses.