Machine Learning Monthly Newsletter 💻🤖

Daniel Bourke
Daniel Bourke
hero image

36th issue! If you missed them, you can read the previous issues of the Machine Learning Monthly newsletter here.

Happy New Year everyone!

Daniel here, I’m a machine learning engineer who teaches the following beginner-friendly machine learning courses:

I also write regularly about machine learning on my own blog as well as make videos on the topic on YouTube.

Since there's a lot going on, the utmost care has been taken to keep things to the point.

Enough about me!

You're here for this month's Machine Learning Monthly Newsletter. Typically a 500ish (+/-1,000ish, usually +) word post detailing some of the most interesting things on machine learning I've found in the last month.

What you missed in December as a Machine Learning Engineer…

My work 👇

[Blog Post] The Three Most Common Errors in PyTorch and How to Fix Them

Going through the new ZTM PyTorch course you might find that we run into some errors when coding together. Namely, these three errors are most often:

  • shape errors (the shape of your tensors doesn’t line up)
  • datatype errors (the datatypes of your tensors or operations don’t line up)
  • device issues (the device your running compute on doesn’t match the device your data is stored on)

You can find an in-depth blog post I wrote about these errors here as well as the code on GitHub.

From the Internet 🥅

1. Quick-fire Guide to Multi-Modal ML with OpenAI’s CLIP by James Briggs

Multi-modal machine learning involves more than one type of modality.

For example, OpenAI’s CLIP (Contrastive Language-Image Pretraining) deals with images (vision) and language (text) and combines the two.

This means it’s capable of dealing with two modalities: vision and language.

James Briggs’ article walks through how you can use the open-source version of CLIP to create text and image embeddings that have a relationship to each other.

For example, you could create your own photo searching app capable of searching with natural language such as “singing happy birthday” and because the image embeddings are crossed with text embeddings, the text search should return the most relevant images to the text “singing happy birthday”.

2. Open-source state-of-the-art instruction-finetuned embeddings (INSTRUCTOR 👨‍🏫)

INSTRUCTOR is an open-source and instruction-finetuned text embedding model that can create text embeddings specific to any task.

By passing an instruction to the model of how to embed a certain piece of text, as well as the target text, INSTRUCTOR will create embeddings of the given text based on the instruction.

For example, if you wanted to create classification embeddings, you could say, “Represent the text for classification; Input: INPUT_TEXT_HERE” and you would get back a set of embeddings of the input text finetuned for classification.

The same setup can be applied to different problem spaces such as classification, retrieval, clustering, similarity and more.

As well as different domains such as science, finance, business, medicine and more.

instructor-diagram

Overview of the process to create INSTRUCTOR, multiple tasks and instructions are fed as input, giving the INSTRUCTOR model capabilities of dealing with almost an unlimited number of NLP tasks. Source: INSTRUCTOR web page.

See the following for more:

3. OpenAI GPT-3 embeddings are now improved and much, much cheaper

Speaking of embeddings, OpenAI’s embedding API just got improved and is 99.8% cheaper.

For example, you could embed 100k pages of text for $40 USD.

The new embedding model (text-embedding-ada-002) improves performance across several benchmarks (except for text-similarity-davinci-001 on text classification).

This makes it possible for world-class text-based apps to be created at a much cheaper price.

But it also raises the question of whether you should go paid vs. free (open-source).

As discussed above, there are now state-of-the-art embedding models such as INSTRUCTOR available on Hugging Face to use for free.

The benefit of using the paid service is that usually, the API is clean and easy to use.

However, the code for using a Hugging Face model is also clean and easy to use.

Best to experiment with which works better for your use case.

Bonus: If you’d like to see the kinds of things you can do with the OpenAI embeddings, the OpenAI Cookbook repository on GitHub is full of code examples.

4. AI project idea: Build an AI chatbot based on your favourite podcast

In I Built an AI Chatbot Based On My Favourite Podcast, Dan Shipper discusses how he created a chatbot using text embeddings (the same OpenAI embeddings discussed above) of the transcripts of the Huberman Lab podcast.

He lists the principles of the chatbot as follows:

  1. It ingests and makes searchable all of the transcripts from the Huberman Lab podcasts.
  2. When a user asks a question, it searches through all of the transcripts it has available and finds sections that are relevant to the query.
  3. Then, it takes those sections of text and sends them to GPT-3 with a prompt that looks something like:

Answer the question as truthfully as possible using the provided context, and if the answer is not contained within the text below, say "I don't know."

[ relevant sections of Huberman Lab transcripts ]

Q: What is task bracketing?

A:

This is an epic example of the kind of apps of the future, a chatbot finetuned on a certain corpus that is able to give back all kinds of relevant information about a narrow topic.

As for a project idea, you could build the same thing but with your own favourite podcast. If you do, let me know, I’d love to see it.

5. Open-source billion-scale vision foundation model (state-of-the-art image embeddings 🌅)

All the embeddings getting an upgrade this month!

With the release of EVA (Exploring the Limits of Masked Visual Representation Learning at Scale), a 1-billion parameter model for vision, as open-source, this means there’s now a way to turn your images into state-of-the-art embeddings as well!

EVA achieves 89.7% accuracy on ImageNet and comes with several models:

  • EVA or EVA-CLIP for image and video classification (1.0B parameters)
  • EVA for object detection and image segmentation (1.0B parameters)

I’m already thinking of ideas of how to use the EVA (and EVA-CLIP) embeddings in creating Nutrify (take a photo of food and learn about it), perhaps by embedding the images and then using the images to improve/find weaknesses in the data.

See more about EVA at the following resources:

6. Tivadar Danka’s epic math for machine learning blog

I’ve stumbled upon Tivadar Danka’s incredible math-related Tweets a fair few times.

But I never knew there were blog posts to go along with them!

For example, his post on how the dot product measures similarity is a must read for anyone who wants to know why matrix multiplication uses the dot product (the dot product is also the main driver of the attention mechanism which is the main driver of the Transformer architecture).

Tivadar is also writing an upcoming book on the Mathematics of Machine Learning which you can find out more about on his website.

small 2022 11 the dot product featured f98685d1de

The dot product formula, one of the many beautiful images on Tivadar’s website.

7. CALM (Confident Adaptive Language Modeling) language modeling technique leads to ~2x faster inference time

With the rise of large language models, it makes sense to try and make them faster.

Especially on inference time.

After all, a large language model might take a week or month to train but then inferences will be performed for an unconceived amount of time after that.

If inference takes a long time, because of the scale it could be made at, that time could add up exponentially.

Google AI’s new CALM (Confident Adaptive Language Modeling) technique uses early exiting as a strategy to speed up language model inference time.

For example, if a model is confident on an output in earlier layers, it could send the output straight to the output layer rather than send it through all subsequent layers.

But if a model isn’t confident on an output, it could use more compute when necessary.

Like when someone asks you a question you know the answer to, such as, “how old are you?” you don’t sit there evaluating all the options, you say “29”.

Whereas if someone asks you directions from where you’re standing to the local supermarket, you may have to think a bit more about which way to go so the answer becomes more elaborate.

Using the CALM technique enables similar quality of outputs with a ~2-3x improvement on inference time.

Seeing this makes me wonder if the same could be done for vision models.

calm-language-model

The CALM technique demonstrated by the outputs exiting early for layers which are confident, skipping ahead saves computation time on the later layers. Source: Google AI blog.

8. Not All Vector Databases are Made Equal by Dmitry Kan

I’ve been learning more and more about representing data as embeddings (as seen by the resources discussed above).

And one of the ways to store embedding representations is via a vector database.

A vector database is really just a normal database but specialized for embedding vectors.

For example, if you store your images as embeddings and you wanted to search those embeddings, what kind of search techniques could you use (e.g. similarity search, nearest neighbours, etc)?

Vector databases help you perform these kinds of searches.

Dmitry Kan’s article goes through the pros and cons of many of the most popular vector database offerings out there such as Pinecone, Weaviate, Milvus, Vespa, Vald and Qdrant.

9. Comma AI (open-source self-driving) drives to Taco Bell

At the start of 2022, Comma AI Tweeted that they’d be driven to Taco Bell in a self-driving car powered by open source code (openpilot) in a stock production car using a comma three without disengagement.

comma-ai-driving-tweet

The original Comma AI Tweet at the start of 2022 stating their goal. Source: commai_ai Twitter.

And they did it!!!

Seeing this performed by such a small team and watching the improvements happen little by little is nothing short of inspiring.

See more about the drive here:

10. The 4 main tasks of MLOps in production

A nice post on MLOps from the Outerbounds team recently, The 4 main tasks in the Production ML Lifecycle, MLOps and Data-centrics AI.

Inside they talk about a recent paper that reviewed many of the practices companies go through to create machine learning based services and compared them to how machine learning is typically taught (textbook style).

They broke it down into four main tasks:

  1. Data collection — where does the data come from?
  2. Experimentation — what experiments can you do (data based and model based)?
  3. Evaluation and deployment — how does your experiment perform locally and deployed?
  4. Monitoring and response — what improvements can be made to the model once results from deployment are back?

mlops-4-steps

They also talk about the three V’s which are important parts of each step:

  1. Velocity — how fast can your experiments happen? (ideally, your experiments happen as quickly as possible with the right tradeoff for validation)
  2. Validation — how can you make sure your model performs as well in production as on your own system? (too much validation slows velocity, too little validation results in shipping poor models)
  3. Versioning — how do you version everything that goes into an ML experiment? In other words, what data gets used to create what model?

As I build my own full-stack project, Nutrify, I’m thinking about how each of these work in.

11. Quick fire resources 🔥

See you next month!

What a massive month for the ML world in December!

I can't wait to see what's in store for 2023. Make sure to subscribe below to follow along each month.

And as always, let me know if there's anything you think should be included in a future post.

In the meantime, keep learning, keep creating, keep dancing.

See you next month,
Daniel

www.mrdbourke.com | YouTube

By the way, I'm a full-time instructor with Zero To Mastery Academy teaching people Machine Learning in the most efficient way possible. You can see a few of our courses below or check out all Zero To Mastery courses.

More from Zero To Mastery

ZTM Career Paths: Your Roadmap to a Successful Career in Tech preview
Popular
ZTM Career Paths: Your Roadmap to a Successful Career in Tech

Whether you’re a beginner or an experienced professional, figuring out the right next step in your career or changing careers altogether can be overwhelming. We created ZTM Career Paths to give you a clear step-by-step roadmap to a successful career.

The 3 Most Common PyTorch Errors (And How To Solve Them) preview
The 3 Most Common PyTorch Errors (And How To Solve Them)

PyTorch is one of the largest ML libraries available, so it's common to make mistakes when using it. Here are the top 3 user errors and how to fix them.

Python Monthly Newsletter 💻🐍 preview
Python Monthly Newsletter 💻🐍

37th issue of Andrei Neagoie's must-read monthly Python Newsletter: Generators, Nested For-loops, and the origins of Python. All this, and much more. Read the full newsletter to get up-to-date with everything you need to know from last month.