36th issue! If you missed them, you can read the previous issues of the Machine Learning Monthly newsletter here.
Happy New Year everyone!
Daniel here, Iβm a machine learning engineer who teaches the following beginner-friendly machine learning courses:
- Complete A.I. Machine Learning and Data Science Bootcamp: Zero to Mastery
- TensorFlow for Deep Learning: Zero to Mastery
- PyTorch for Deep Learning: Zero to Mastery
- [NEW] Project: Build a custom text classifier and demo with Hugging Face Transformers
- [NEW] Machine Learning with Hugging Face Bootcamp
I also write regularly about machine learning on my own blog as well as make videos on the topic on YouTube.
Since there's a lot going on, the utmost care has been taken to keep things to the point.
Enough about me!
You're here for this month's Machine Learning Monthly Newsletter. Typically a 500ish (+/-1,000ish, usually +) word post detailing some of the most interesting things on machine learning I've found in the last month.
What you missed in December as a Machine Learning Engineerβ¦
My work π
[Blog Post] The Three Most Common Errors in PyTorch and How to Fix Them
Going through the new ZTM PyTorch course you might find that we run into some errors when coding together. Namely, these three errors are most often:
- shape errors (the shape of your tensors doesnβt line up)
- datatype errors (the datatypes of your tensors or operations donβt line up)
- device issues (the device your running compute on doesnβt match the device your data is stored on)
You can find an in-depth blog post I wrote about these errors here as well as the code on GitHub.
From the Internet π₯
1. Quick-fire Guide to Multi-Modal ML with OpenAIβs CLIP by James Briggs
Multi-modal machine learning involves more than one type of modality.
For example, OpenAIβs CLIP (Contrastive Language-Image Pretraining) deals with images (vision) and language (text) and combines the two.
This means itβs capable of dealing with two modalities: vision and language.
James Briggsβ article walks through how you can use the open-source version of CLIP to create text and image embeddings that have a relationship to each other.
For example, you could create your own photo searching app capable of searching with natural language such as βsinging happy birthdayβ and because the image embeddings are crossed with text embeddings, the text search should return the most relevant images to the text βsinging happy birthdayβ.
2. Open-source state-of-the-art instruction-finetuned embeddings (INSTRUCTOR π¨βπ«)
INSTRUCTOR is an open-source and instruction-finetuned text embedding model that can create text embeddings specific to any task.
By passing an instruction to the model of how to embed a certain piece of text, as well as the target text, INSTRUCTOR will create embeddings of the given text based on the instruction.
For example, if you wanted to create classification embeddings, you could say, βRepresent the text for classification; Input: INPUT_TEXT_HEREβ and you would get back a set of embeddings of the input text finetuned for classification.
The same setup can be applied to different problem spaces such as classification, retrieval, clustering, similarity and more.
As well as different domains such as science, finance, business, medicine and more.

Overview of the process to create INSTRUCTOR, multiple tasks and instructions are fed as input, giving the INSTRUCTOR model capabilities of dealing with almost an unlimited number of NLP tasks. Source: INSTRUCTOR web page.
See the following for more:
3. OpenAI GPT-3 embeddings are now improved and much, much cheaper
Speaking of embeddings, OpenAIβs embedding API just got improved and is 99.8% cheaper.
For example, you could embed 100k pages of text for $40 USD.
The new embedding model (text-embedding-ada-002) improves performance across several benchmarks (except for text-similarity-davinci-001 on text classification).
This makes it possible for world-class text-based apps to be created at a much cheaper price.
But it also raises the question of whether you should go paid vs. free (open-source).
As discussed above, there are now state-of-the-art embedding models such as INSTRUCTOR available on Hugging Face to use for free.
The benefit of using the paid service is that usually, the API is clean and easy to use.
However, the code for using a Hugging Face model is also clean and easy to use.
Best to experiment with which works better for your use case.
Bonus: If youβd like to see the kinds of things you can do with the OpenAI embeddings, the OpenAI Cookbook repository on GitHub is full of code examples.
4. AI project idea: Build an AI chatbot based on your favourite podcast
In I Built an AI Chatbot Based On My Favourite Podcast, Dan Shipper discusses how he created a chatbot using text embeddings (the same OpenAI embeddings discussed above) of the transcripts of the Huberman Lab podcast.
He lists the principles of the chatbot as follows:
- It ingests and makes searchable all of the transcripts from theΒ Huberman LabΒ podcasts.
- When a user asks a question, it searches through all of the transcripts it has available and finds sections that are relevant to the query.
- Then, it takes those sections of text and sends them to GPT-3 with a prompt that looks something like:
Answer the question as truthfully as possible using the provided context, and if the answer is not contained within the text below, say "I don't know."
[ relevant sections of Huberman Lab transcripts ]
Q: What is task bracketing?
A:
This is an epic example of the kind of apps of the future, a chatbot finetuned on a certain corpus that is able to give back all kinds of relevant information about a narrow topic.
As for a project idea, you could build the same thing but with your own favourite podcast. If you do, let me know, Iβd love to see it.
5. Open-source billion-scale vision foundation model (state-of-the-art image embeddings π )
All the embeddings getting an upgrade this month!
With the release of EVA (Exploring the Limits of Masked Visual Representation Learning at Scale), a 1-billion parameter model for vision, as open-source, this means thereβs now a way to turn your images into state-of-the-art embeddings as well!
EVA achieves 89.7% accuracy on ImageNet and comes with several models:
- EVA or EVA-CLIP for image and video classification (1.0B parameters)
- EVA for object detection and image segmentation (1.0B parameters)
Iβm already thinking of ideas of how to use the EVA (and EVA-CLIP) embeddings in creating Nutrify (take a photo of food and learn about it), perhaps by embedding the images and then using the images to improve/find weaknesses in the data.
See more about EVA at the following resources:
6. Tivadar Dankaβs epic math for machine learning blog
Iβve stumbled upon Tivadar Dankaβs incredible math-related Tweets a fair few times.
But I never knew there were blog posts to go along with them!
For example, his post on how the dot product measures similarity is a must read for anyone who wants to know why matrix multiplication uses the dot product (the dot product is also the main driver of the attention mechanism which is the main driver of the Transformer architecture).
Tivadar is also writing an upcoming book on the Mathematics of Machine Learning which you can find out more about on his website.

The dot product formula, one of the many beautiful images on Tivadarβs website.
7. CALM (Confident Adaptive Language Modeling) language modeling technique leads to ~2x faster inference time
With the rise of large language models, it makes sense to try and make them faster.
Especially on inference time.
After all, a large language model might take a week or month to train but then inferences will be performed for an unconceived amount of time after that.
If inference takes a long time, because of the scale it could be made at, that time could add up exponentially.
Google AIβs new CALM (Confident Adaptive Language Modeling) technique uses early exiting as a strategy to speed up language model inference time.
For example, if a model is confident on an output in earlier layers, it could send the output straight to the output layer rather than send it through all subsequent layers.
But if a model isnβt confident on an output, it could use more compute when necessary.
Like when someone asks you a question you know the answer to, such as, βhow old are you?β you donβt sit there evaluating all the options, you say β29β.
Whereas if someone asks you directions from where youβre standing to the local supermarket, you may have to think a bit more about which way to go so the answer becomes more elaborate.
Using the CALM technique enables similar quality of outputs with a ~2-3x improvement on inference time.
Seeing this makes me wonder if the same could be done for vision models.

The CALM technique demonstrated by the outputs exiting early for layers which are confident, skipping ahead saves computation time on the later layers. Source: Google AI blog.
8. Not All Vector Databases are Made Equal by Dmitry Kan
Iβve been learning more and more about representing data as embeddings (as seen by the resources discussed above).
And one of the ways to store embedding representations is via a vector database.
A vector database is really just a normal database but specialized for embedding vectors.
For example, if you store your images as embeddings and you wanted to search those embeddings, what kind of search techniques could you use (e.g. similarity search, nearest neighbours, etc)?
Vector databases help you perform these kinds of searches.
Dmitry Kanβs article goes through the pros and cons of many of the most popular vector database offerings out there such as Pinecone, Weaviate, Milvus, Vespa, Vald and Qdrant.
9. Comma AI (open-source self-driving) drives to Taco Bell
At the start of 2022, Comma AI Tweeted that theyβd be driven to Taco Bell in a self-driving car powered by open source code (openpilot) in a stock production car using a comma three without disengagement.

The original Comma AI Tweet at the start of 2022 stating their goal. Source: commai_ai Twitter.
And they did it!!!
Seeing this performed by such a small team and watching the improvements happen little by little is nothing short of inspiring.
See more about the drive here:
10. The 4 main tasks of MLOps in production
A nice post on MLOps from the Outerbounds team recently, The 4 main tasks in the Production ML Lifecycle, MLOps and Data-centrics AI.
Inside they talk about a recent paper that reviewed many of the practices companies go through to create machine learning based services and compared them to how machine learning is typically taught (textbook style).
They broke it down into four main tasks:
- Data collection β where does the data come from?
- Experimentation β what experiments can you do (data based and model based)?
- Evaluation and deployment β how does your experiment perform locally and deployed?
- Monitoring and response β what improvements can be made to the model once results from deployment are back?

They also talk about the three Vβs which are important parts of each step:
- Velocity β how fast can your experiments happen? (ideally, your experiments happen as quickly as possible with the right tradeoff for validation)
- Validation β how can you make sure your model performs as well in production as on your own system? (too much validation slows velocity, too little validation results in shipping poor models)
- Versioning β how do you version everything that goes into an ML experiment? In other words, what data gets used to create what model?
As I build my own full-stack project, Nutrify, Iβm thinking about how each of these work in.
11. Quick fire resources π₯
- Docker support comes to Hugging Face Spaces (now you can create an host your own custom apps!)
- Weights & Biases releases a free MLOps course
- PyTorch 2.0 beta gets released with upgraded model speed on newer GPUs in just one line of code! Full release coming in 2023
- An epic YouTube playlist of videos and talks about learning with limited and imperfect data from Computer Vision and Pattern Recognition Conference (CVPR) 2022
- Tweet idea on bootstrapping an object detection dataset (e.g. download a large unlabelled dataset then use an open-source model to create prediction labels for it)
- Another Tweet idea on bootstrapping your own dataset using a small amount of labelled data to train a bigger model to label more data
See you next month!
What a massive month for the ML world in December!
I can't wait to see what's in store for 2023. Make sure to subscribe below to follow along each month.
And as always, let me know if there's anything you think should be included in a future post.
In the meantime, keep learning, keep creating, keep dancing.
See you next month,
Daniel
By the way, I'm a full-time instructor with Zero To Mastery Academy teaching people Machine Learning in the most efficient way possible. You can see a few of our courses below or check out all Zero To Mastery courses.









