AI & Machine Learning Monthly Newsletter 💻🤖

Daniel Bourke
Daniel Bourke
hero image

53rd issue! If you missed them, you can read the previous issues of my monthly A.I. & Machine Learning newsletter here.

Hey there, Daniel here.

I’m an A.I. & Machine Learning Engineer who also teaches the following beginner-friendly machine learning courses:

I also write regularly about A.I. and machine learning on my own blog as well as make videos on the topic on YouTube.

Since there's a lot going on, the utmost care has been taken to keep things to the point.

Enough about me! You're here for this month's A.I. & Machine Learning Monthly Newsletter.

Typically a 500ish (+/-1,000ish, usually +) word post detailing some of the most interesting things on machine learning I've found in the last month.

Here's what you might have missed in May 2024 as an A.I. & Machine Learning Engineer... let's get you caught up!

My Work 👇

Nutrify 1.2 is out now!

My brother and I have been working on a food and nutrition tracking/education app and we just released version 1.2.

This update includes being able to set custom calorie and macronutrient goals, simple breakdowns of whole food intakes and 57 new foods in the FoodVision AI model/Nutridex.

If you or someone you know wants to learn more about whole foods and track your nutrition intake, check out Nutrify on the iOS App Store.

01-nutrify-calorie-goals-and-breakdowns

New views and features available in Nutrify 1.2 — create custom calorie intake goals or let Nutrify calculate them for you as well as get simple breakdowns of your whole food intake over time. Source: Nutrify 1.2 blog post.

From the Internet 🧠

This month’s theme is: bringing LLMs into production.

A common trend I’ve seen with the following resources is that many of the previous problems with bringing ML models to production apply to LLMs.

This includes use cases, custom datasets, input optimization/data preprocessing (e.g. prompt engineering), latency and evaluations.

And so even though the capabilities of LLMs are continuing to be discovered and refined, many of the engineering steps around bringing them to production are similar to previous generations of machine learning models.

1. What We Learned from a Year of Building with LLMs (Part 1)

By Eugene Yan, Bryan Bischof, Charles Frye, Hamel Husain, Jason Liu and Shreya Shankar

Several researchers and practitioners from across the industry combine to give insights into what it takes to put LLM models into production.

The report is broken into three parts: tactical, operational and strategic.

It’s filled with tried and tested advice such as:

  • Have small prompts that can do one thing and only one thing, well.
  • Having a larger context window doesn’t necessarily mean you need to use it all (larger contexts = more tokens = more cost + latency).
  • Start with simple assert-based evaluations to check your LLM outputs (for example, check if the LLM outputs have “Here is your summary:” if you don’t want to include that text).
  • Simplify LLM-as-judge annotations into binary comparisons (comparing one sample against another is more efficient than creating a new best example from scratch). This can be very helpful to bootstrap an evaluation system.

using-llm-as-a-judge

One way to use an LLM to evaluate outputs/options is to do it so comparatively. For example, compare two examples and pick the best one. Source: What we learned from a year of building with LLMs (part 1).

Many of the points I’ve found work really well in practice during my own experimentation and usage of LLMs in production.

A highly recommended read for anyone looking to build LLM-powered applications.

If this was Part 1, I’m looking forward to Part 2.

2. LinkedIn’s notes on how they deployed their own LLM features

The LinkedIn team share their lessons learned from productionizing an LLM-powered system that can answer career-focused and job-focused questions.

Some important findings:

  • Prompting is more art than science. Much of the generative AI application building was prompt-tweaking.
  • Baseline system took ~1 month to create. But making it production ready took an additional 4 months.
  • Evaluations are an ongoing process, how do you ensure an LLM replies with an empathetic response if person’s current skills are fit for a job? And how do ensure this happens at scale?

3. GoDaddy’s 10 LLM lessons from the trenches

GoDaddy share 10 lessons deploying an LLM-based application to help with customer support channels which receive 60,000+ contacts per day.

Some of my favourites:

  • Sometimes one prompt isn’t enough — GoDaddy started by putting many topics into a single prompt. However, they found that after time this started to get bloated. So instead, they broke it down into smaller more manageable systems that connect to each other. One task, one prompt.
  • Improving structured outputs — They found that minimizing temperature can help with structured outputs (e.g. getting back JSON). And that more advanced (and more costly) models are generally better at returning structured content.
  • Prompts aren’t portable across models — Before you choose to upgrade to the latest model and ship it to production, be careful it doesn’t break your previous tests. In other words, prompts used for a model like GPT might not work the same for Claude or Gemini or Mistral. GoDaddy found the same could be true across model versions of the same model, e.g. different versions of GPT-4.
  • Adaptive model selection is the future — Beware the dependence on a single model provider. ChatGPT systems went down for a few hours and in turn, so did all of GoDaddy’s support systems that relied on ChatGPT. Now in an ideal world, downtimes would be rare. However, an even more ideal scenario may mean having multiple model providers on standby in the event of an outage.

adaptive-model-selection

If you rely too much on one model for servicing your application, what happens if that model goes down? In an ideal world, another model is called upon to pick up. Source: GoDaddy blog.

4. Building LLM applications, the MLOps way by Willem Meints

Willem writes an excellent post about how many of the current problems with deploying and using LLMs are similar to previous problems with other kinds of machine learning models.

Some excerpts:

  • MLOps in general: Getting a machine learning model in production takes a fair few extra steps after you’ve trained one in a notebook.
  • Version tracking: Tracking the version of the LLM you used in production is important because the default API model may change. As in you were using gpt-4-0125 and it performed well. But the latest update breaks a few of your tasks.
  • Evaluation: There are many parts of the LLM pipeline such as retrieval, prompt templating, parameters to the LLM (e.g. temperature, top_p) and output parsing. While you may not own the LLM you can still monitor the steps around it. To see how your model is going in the real world, you can evaluate production data. But obtaining high-quality production data for LLMs can be hard. Many people don’t want their conversations stored. Evaluations are an ongoing process.

5. Fine-tuning a Vision-Language Model (VLM) to generate fashion product descriptions

The AWS team share details on how they fine-tuned a BLIP-2 model to generate fashion descriptions for a product image. This is a great way to enrich a website with metadata information on products which is helpful for search engines and search in general.

There’s also some great insights on the combination of using Hugging Face Transformers (for baseline models), Accelerate (for faster model training) and PEFT (parameter efficient fine-tuning for quicker model customization) in combination.

fine-tune-vlm-for-fashion-ML-16079-architecture

AWS’s example fine-tuning and deployment pipeline for a model which is capable of generating product descriptions given an image of an item of clothing. Source: AWS tech blog.

6. How to fine-tune LLMs in 2024 by Phil Schmid

Philipp Schmid, Technical Lead at Hugging Face shares code examples and a walkthrough on fine-tuning your own LLM in 2024.

Inside you’ll find:

  • Use case development
  • Dataset creation
  • Model fine-tuning
  • Model evaluation
  • Model serving

For a recent concrete example, I’d also recommend checking out Phil’s article on how to fine-tune Google Gemma.

Heck, check out all of Phil’s blog in general, it’s full of ML epicness.

7. Two time series foundation models

Google and Salesforce have released two open-source deep learning based time series foundation models (what a mouthful!).

TimesFM by Google is a Transformer Decoder-only (similar to an LLM but for time series) trained on 100B time series points from a variety of sources. The model is 200M parameters so it can be run on quite small GPUs.

Despite its small size, the model performs very well in a zero-shot setting across a wide range of benchmarks, even against models which have been explicitly trained on the target data.

Code is available on GitHub and the TimesFM model is available on Hugging Face.

Moirai by Salesforce comes in 3 sizes, small (14M parameters), base (91M parameters) and large (311M parameters).

These models have been trained on 29B data points across 9 different domains including energy, transport, climate, sales, healthcare and more.

Code is available on GitHub and the dataset and models available on Hugging Face.

Open-source 📖

marigold-depth-model

Example of the Marigold depth-estimation model. Source: Marigold Hugging Face demo.

Papers and Research 🔬

Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach

Researchers at Meta, INRIA, Université Paris Saclay and Google show how to make highly-curated datasets automatically from large-scale datasets with hierarchical k-means clustering.

As in, you could take a large corpus of images from the internet of 1+ billion samples and then filter it down to the target domain you’re looking to work with, say 10+ million images of cars.

This highly curated dataset could then be used for self-supervised learning or supervised training.

See the code on GitHub.

herirarchical-k-means

Outline of hierarchical k-means filtering workflow for curating large-scale datasets into target datasets. Source: Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach paper.

ODGEN: Domain-specific Object Detection Data Generation with Diffusion Models

Researchers at Apple show how you can fine-tune diffusion models to generate images with objects similar to your own dataset. Detection models trained with real and synthetic data received a boost of up to 25.3% mAP@.50:.95.

Synthetic data has come a long way over the past few years. And this shows how it’s expanding to even more specific tasks such as object detection.

Screenshot 2024-05-31 at 5.46.48 PM

Qualitative example of Apple’s ODGEN method for generating data-specific items in bounding boxes. Source: Apple Machine Learning blog.

An Introduction to Vision-Language Modeling

Several researchers across many institutions such as Meta, Université de Montréal, McGill University, University of Toronto, MIT and more have collaborated to create a comprehensive introduction to Visual-Language Modeling (VLMs).

The introduction is designed to be a “start here” for many of the concepts in VLMs such as different types of VLMs, how train VLMs, which VLM should you use?, evaluating VLMs, extending VLMs from images and text to videos and more.

If you’re looking to learn more about how LLMs can be bridged to vision data, this is an excellent place to start.

Releases and announcements 📣

Presentations, Courses and Tutorials 📽

Vicky Boykis is one of my favourite voices in the world of machine learning. And her recent talk at PyCon Italia is filled with excellent advice. The main one being when thinking of building a feature with machine learning or AI, get closer to the metal (as the metal that your code is running on).

How?

  • Pick a single task.
  • Pick a measurable goal.
  • Pick the smallest piece of code you reproduce locally.

how-to-stay-closer-to-the-metal

How to stay closer to the Metal by Vicky Boykis. Get to the smallest possible piece you can and make it reproducible, then scale it up. Source: Vicky Boykis blog.

And start there.

Another great takeaway I enjoyed was the philosophy of Unix quote by Doug McIlroy, “Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new features.”

Finally, what would you do if you had 1000 interns? That’s a way you can think of LLMs. Are there tasks you’d delegate to 1000 smart interns? Maybe they’d be suitable for LLMs.

You can watch the full talk on YouTube (it starts at 8:25:16 in the video).

See you next month!

What a massive month for the ML world in May!

As always, let me know if there's anything you think should be included in a future post.

In the meantime, keep learning, keep creating, keep dancing.

See you next month,

Daniel

www.mrdbourke.com | YouTube

By the way, I'm also an instructor with Zero To Mastery Academy teaching people Machine Learning & AI in the most efficient way possible. You can see a few of our courses below or check out all Zero To Mastery courses.

More from Zero To Mastery

The No BS Way To Getting A Machine Learning Job preview
The No BS Way To Getting A Machine Learning Job

Looking to get hired in Machine Learning? Our ML expert tells you how. If you follow his 5 steps, we guarantee you'll land a Machine Learning job. No BS.

6-Step Framework To Tackle Machine Learning Projects (Full Pipeline) preview
6-Step Framework To Tackle Machine Learning Projects (Full Pipeline)

Want to apply Machine Learning to your business problems but not sure if it will work or where to start? This 6-step guide makes it easy to get started today.

Python Monthly Newsletter 💻🐍 preview
Python Monthly Newsletter 💻🐍

54th issue of Andrei Neagoie's must-read monthly Python Newsletter: 100x Python, Python 3.13, and much more. Read the full newsletter to get up-to-date with everything you need to know from last month.