[April 2023] Machine Learning Monthly Newsletter 💻🤖

40th issue! If you missed them, you can read the previous issues of the Machine Learning Monthly newsletter here.

Hey there, Daniel here.

I’m a Machine Learning Engineer who also teaches the following beginner-friendly machine learning courses:

I also write regularly about machine learning on my own blog as well as make videos on the topic on YouTube.

Since there's a lot going on, the utmost care has been taken to keep things to the point.

Enough about me! You're here for this month's Machine Learning Monthly Newsletter.

Typically a 500ish (+/-1,000ish, usually +) word post detailing some of the most interesting things on machine learning I've found in the last month.

What you missed in March 2023 as a April Learning Engineer…

My work 👇

PyTorch 2.0 tutorials are on the way to ZTM! — I just finished recording ~20 or so videos for the PyTorch 2.0 Tutorial (text + code materials on learnpytorch.io already). They’re being edited as you read this and will be live on ZTM shortly!
AI anxiety (blog post) — A fair few people (myself included) caught a small wave of anxiety with all the new releases in the AI world lately. Asking questions like “What happens if AI takes over?” (similar questions that have been asked for decades). I wrote a blog post which collected some of my unfinished thoughts.

From the Internet

1. Large Language Model (LLM) use cases in production by Chip Huyen

Chip Huyen explores the use cases of large language models (LLMs) in businesses.

It’s one thing to show off a cool demo but how many demos get turned into products?

This is especially the case for LLMs.

Although the field is flourishing with new models, there’s still a bunch of things holding them back from being as productized as many would like:

Cost — Many of the best LLMs require an API call each time they’re used. This can quickly add up.
Latency — For some mission critical applications, latency cannot be negotiated on. And because LLMs are so new, the hardware running them is still being worked out.
Reliability — There’s still no guaranteed way to get a desired outcome from an LLM (the number 1 rule of machine learning: don’t use it if a simple program will do).

cost-of-llmops-inference

Most of the cost of using LLMs comes from inference.

2. 3 different text embeddings explorations

Text embedding comparison
Text embeddings with OpenAI and then creating a vector search engine on top
3 mistakes when using embeddings as vector search
Embeddings are learned transformations to make data more useful

One of the best use cases for large language models is to create text embeddings (other modes of embeddings such as image, audio and video are starting to emerge now too).

Embeddings in general are learned transformations which data more useful.

As in when I search for “tacos with chicken recipe”, I don’t have to find resources exactly titled “tacos with chicken recipe”, I can find resources which have similar meaning.

The following three articles are good explorations of the practical use cases of various paid for (OpenAI’s API) and open-source (sentence-transformers) ways to create text embeddings and then use them for several different applications:

Text embeddings cost comparison — Nils Reimers explores the cost and performance of a bunch of different available text-embedding models. The numbers are from January 2022 so take them with a grain of salt but it shows just how good results you can get with open-source models!
Creating a vector database to search over using OpenAI’s Embedding Model — Teemu Maatta shows how you embed different sources of text and then group them into clusters (via Kmeans) and then summarise those clusters with OpenAI’s API. Very helpful for exploring large knowledge-bases and finding similar resources!
Three mistakes when introducing embeddings and vector search — Jo Kristian Bergum discusses the pitfalls of using off the shelf embedding models to create your search system, namely: a lack of fine-tuning can lead to poor results, using fine-tuned models on out of domain sources, lack of understanding of vector search trade offs (such as latency, throughput and accuracy).

three-considerations-for-text-embedding-usage

When using text embedding models, which are rather new in the grand scheme of things, what things should you take into consideration?

3. Open-source ChatGPT’s are here! (Hugging Chat, StableLM and OpenAssistant are live!)

Three different versions of open-source large language models launched in the past month!

HuggingChat is an open-source, ChatGPT-like interface powered by OpenAssistant (see below) and hosted on HuggingFace. It’s fast too!
OpenAssistant is a completely open-source, ChatGPT-like interface but also includes a paper, dataset, model weights and community discussing everything about how it was built and how it will be improved over time.
Stability AI (creators of Stable Diffusion) launches StabeLM. Not quite 100% open-source (it’s research only rather than commercial) but still open-source to use, Stability joins the LLM party with a model trained on an experimental dataset with 3x more data than what’s in “the Pile” (an already really large text-based dataset).

4. Text-to-Image models get better at… text? ft. DeepFloyd IF

Stability AI releases another text-to-image model called DeepFloyd IF (research only for now, fully open-source model coming later).

Some notable features include:

Deep text understanding (thanks to a T5-XXL-1.1 text encoder model).
Generation of text description into images (e.g. text-to-image generations can now include various texts to be printed in the image).
Aspect ratio shift enables the ability to generate images with a non-standard aspect ratio.

deepfloyd-image-generations

Some examples of images generated by DeepFloyd IF. Notice the quality of the text generations as well as the different aspect ratio on the image in the bottom left.

5. A new kind of data science competition by LAION AI

LAION-AI, creators of OpenCLIP, LAION-2B and LAION-5B (the datasets that power Stable Diffusion) are hosting a new kind of data science competition.

Instead of iterating on a model architecture and training scheme, the DataComp competition focuses on improving a dataset.

The goal is to train the best performing CLIP-style model with the least compute.

How?

Iterate on the dataset!

DataComp releases a new dataset called CommonPool, a large-scale dataset with 12.8B (12,800,000,000!!!) image and text pairs pooled from the internet.

There are several scales of competition to participate in, small (12.8M samples), medium (128M samples), large (1.28B samples) and xlarge (12.8B samples).

To my knowledge, this competition is the first of its kind.

I can’t wait to see what comes of it!

I’ll probably be playing around with the CommonPool dataset for my own project Nutrify (take a photo of food and learn about it) at some time too.

See more at:

5. FastAI 2023 Part 2 is out! All the way from the foundations to Stable Diffusion

Part 2 of the fast.ai Practical Deep Learning for Coders course is live!

The 2023 edition is titled “From Deep Learning Foundations to Stable Diffusion”.

So you can imagine what the content contains.

Building upon a plethora of practical deep learning knowledge from the first part of the course, Part 2 will take you all the way through to making your own diffusion model (the same kind of model that powers Stable Diffusion!).

I started my AI journey with Fast AI courses and I’m thrilled to see the latest edition packed with so many fantastic updates!

All of the materials, including 30 hours of video lectures, code and text-based lectures are available free on the fast.ai course website.

6. Meta AI takes image segmentation to a whole new level with Segment Anything

Segmentation is the practice of identifying which pixels belong to an object in an image or video.

For example, if you have a photo of a cup on a table, to segment the cup, you could draw a line around its outline.

However, doing this by hand takes a significant amount of time.

The goal of a segmentation model is to do this automatically.

And that’s what Meta’s new Segment Anything Model (SAM) does.

It automatically segments a wide range of objects in an image (it can create object detection boxes too).

Trained on 1-Billion existing segmentations, it has learned the general concept of an “object” and thus can be applied to almost any image with excellent results.

The SAM model is open-sourced under the Apache 2.0 license (which means it can be used for commercial products!).

Data was collected by an iterative process of: label data, use model to predict next label, fix label, update model, repeat (a data engine effect).

segment-anything-demo

The SAM model makes this kind of segmented image possible with a few clicks or even just one.

See more on:

Segment Anything on GitHub
Segment Anything release post
Bonus: See Segment Everything Everywhere All at Once (SEEM) for another incredible open-source segmentation model.

7. Detect objects and segment images with language via GroundingDINO

The Segment-Anything Model (SAM) mentioned above has a fantastic ability to select objects in a given image.

However, it lacks an idea of what those objects are.

In comes GroundingDINO and Grounded Segment Anything.

These models along with SAM are able to find objects in an image given a language prompt.

For example, you could prompt the model with the word “dog” and Grounded DINO or Grounded Segment Anything will select the dog(s) in the image.

grounded-dino-and-grounded-sam-demo

Example of what can be done with Grounded-SAM as a semi-automatic labelling system. Be sure to the see the GitHub for an example of a fully automatic labelling system with image captioning.

This opens a huge possibility of use cases!

My favourite one is automatic data annotation.

Say you had a large corpus of images and you’d like them to be labelled with bounding boxes and segmentation masks.

You could use an image captioning model to describe what’s in the image, then use GroundingDINO or Grounded Segment Anything to label the image based on the image caption.

See more on:

8. A recipe book for designing machine learning applications

Eugene Yan’s latest blog post, Design Patterns for Machine Learning Systems, is an excellent overview of several different design patterns in machine learning systems, such as:

Process raw data only once
Human-In-The-Loop (such as Airbnb’s categories ML system, discussed in last month’s ML Monthly issue)
Data augmentation
Hard negative mining
Reframing
Cascade
Data flywheel (similar to what I’m setting up with Nutrify, replicating Tesla’s data engine but for food images)
Business rules layer
Evaluate before deploy (test a model in shadow mode, behind an existing model, before deploying it)

It’s a great walkthrough of the pros and cons of each method and how machine learning models get integrated into software systems.

Building the model is often the easier part.

Deploying it into a full-blown system is harder.

ml-demos-easy-ml-products-hard

There have been lots of very capable AI demos recently. But not too many make their way into full-blown products.

Source: Brad Neuberg Twitter.

9. A few interesting stories and takes on AI

A few things I believe about AI by Dan Shipper — Dan Shipper’s blog posts on AI over the last few months have been fantastic. From setting up a semantic search engine for his favourite podcast to how GPT-3 is the best journal he’s ever used. His latest post details a few things he foresees as important next steps in AI, such as horizontal integration (owning a process start to finish will be a big advantage).
Why would we want AGIs? by Aniket — Reading through Aniket’s work provides a good perspective on many of the current ideas of AGI. If an AGI did exist, one of the likely first thing we’d like it to do is explore places we can’t yet go, space.
What happens when you setup ChatGPT to talk to a 3 year old? by Arvind Narayanan — One of my favourite new newsletters and blog posts is AI Snake Oil. And their lastest post explores what happens when you connect a ChatGPT interface (via various voice to text and text to voice services) with a 3 year old (Arvind’s daughter). Of course, if these AI models are here to stay, best to explore what they can do with the next generation early right?

10. Fun to finish

Why would something need to be super smart to enslave humanity?

Why does it need to be AGI?

Who’s to say cats don’t already rule?

cats-dominate I’m a dog person. But I can appreciate this sentiment.

Source: Yann LeCun Twitter.

Ha!

I don’t buy the take of AGI taking over.

Maybe I’m wrong.

But it doesn’t make sense.

See you next month!

What a massive month for the ML world in April!

As always, let me know if there's anything you think should be included in a future post.

Liked something here? Leave a comment below.

In the meantime, keep learning, keep creating, keep dancing.

See you next month,

Daniel

www.mrdbourke.com | YouTube

By the way, I'm a full-time instructor with Zero To Mastery Academy teaching people Machine Learning in the most efficient way possible. You can see a few of our courses below or check out all Zero To Mastery courses.