๐ŸŽ Give the #1 gift request of 2024... a ZTM membership gift card! ๐ŸŽ

AI & Machine Learning Monthly Newsletter ๐Ÿ’ป๐Ÿค–

Daniel Bourke
Daniel Bourke
hero image

53rd issue! If you missed them, you can read the previous issues of my monthly A.I. & Machine Learning newsletter here.

Hey there, Daniel here.

Iโ€™m an A.I. & Machine Learning Engineer who also teaches the following beginner-friendly machine learning courses:

I also write regularly about A.I. and machine learning on my own blog as well as make videos on the topic on YouTube.

Since there's a lot going on, the utmost care has been taken to keep things to the point.

Enough about me! You're here for this month's A.I. & Machine Learning Monthly Newsletter.

Typically a 500ish (+/-1,000ish, usually +) word post detailing some of the most interesting things on machine learning I've found in the last month.

Here's what you might have missed in May 2024 as an A.I. & Machine Learning Engineer... let's get you caught up!

My Work ๐Ÿ‘‡

Nutrify 1.2 is out now!

My brother and I have been working on a food and nutrition tracking/education app and we just released version 1.2.

This update includes being able to set custom calorie and macronutrient goals, simple breakdowns of whole food intakes and 57 new foods in the FoodVision AI model/Nutridex.

If you or someone you know wants to learn more about whole foods and track your nutrition intake, check out Nutrify on the iOS App Store.

01-nutrify-calorie-goals-and-breakdowns

New views and features available in Nutrify 1.2 โ€” create custom calorie intake goals or let Nutrify calculate them for you as well as get simple breakdowns of your whole food intake over time. Source: Nutrify 1.2 blog post.

From the Internet ๐Ÿง 

This monthโ€™s theme is: bringing LLMs into production.

A common trend Iโ€™ve seen with the following resources is that many of the previous problems with bringing ML models to production apply to LLMs.

This includes use cases, custom datasets, input optimization/data preprocessing (e.g. prompt engineering), latency and evaluations.

And so even though the capabilities of LLMs are continuing to be discovered and refined, many of the engineering steps around bringing them to production are similar to previous generations of machine learning models.

1. What We Learned from a Year of Building with LLMs (Part 1)

By Eugene Yan, Bryan Bischof, Charles Frye, Hamel Husain, Jason Liu and Shreya Shankar

Several researchers and practitioners from across the industry combine to give insights into what it takes to put LLM models into production.

The report is broken into three parts: tactical, operational and strategic.

Itโ€™s filled with tried and tested advice such as:

  • Have small prompts that can do one thing and only one thing, well.
  • Having a larger context window doesnโ€™t necessarily mean you need to use it all (larger contexts = more tokens = more cost + latency).
  • Start with simple assert-based evaluations to check your LLM outputs (for example, check if the LLM outputs have โ€œHere is your summary:โ€ if you donโ€™t want to include that text).
  • Simplify LLM-as-judge annotations into binary comparisons (comparing one sample against another is more efficient than creating a new best example from scratch). This can be very helpful to bootstrap an evaluation system.

using-llm-as-a-judge

One way to use an LLM to evaluate outputs/options is to do it so comparatively. For example, compare two examples and pick the best one. Source: What we learned from a year of building with LLMs (part 1).

Many of the points Iโ€™ve found work really well in practice during my own experimentation and usage of LLMs in production.

A highly recommended read for anyone looking to build LLM-powered applications.

If this was Part 1, Iโ€™m looking forward to Part 2.

2. LinkedInโ€™s notes on how they deployed their own LLM features

The LinkedIn team share their lessons learned from productionizing an LLM-powered system that can answer career-focused and job-focused questions.

Some important findings:

  • Prompting is more art than science. Much of the generative AI application building was prompt-tweaking.
  • Baseline system took ~1 month to create. But making it production ready took an additional 4 months.
  • Evaluations are an ongoing process, how do you ensure an LLM replies with an empathetic response if personโ€™s current skills are fit for a job? And how do ensure this happens at scale?

3. GoDaddyโ€™s 10 LLM lessons from the trenches

GoDaddy share 10 lessons deploying an LLM-based application to help with customer support channels which receive 60,000+ contacts per day.

Some of my favourites:

  • Sometimes one prompt isnโ€™t enough โ€” GoDaddy started by putting many topics into a single prompt. However, they found that after time this started to get bloated. So instead, they broke it down into smaller more manageable systems that connect to each other. One task, one prompt.
  • Improving structured outputs โ€” They found that minimizing temperature can help with structured outputs (e.g. getting back JSON). And that more advanced (and more costly) models are generally better at returning structured content.
  • Prompts arenโ€™t portable across models โ€” Before you choose to upgrade to the latest model and ship it to production, be careful it doesnโ€™t break your previous tests. In other words, prompts used for a model like GPT might not work the same for Claude or Gemini or Mistral. GoDaddy found the same could be true across model versions of the same model, e.g. different versions of GPT-4.
  • Adaptive model selection is the future โ€” Beware the dependence on a single model provider. ChatGPT systems went down for a few hours and in turn, so did all of GoDaddyโ€™s support systems that relied on ChatGPT. Now in an ideal world, downtimes would be rare. However, an even more ideal scenario may mean having multiple model providers on standby in the event of an outage.

adaptive-model-selection

If you rely too much on one model for servicing your application, what happens if that model goes down? In an ideal world, another model is called upon to pick up. Source: GoDaddy blog.

4. Building LLM applications, the MLOps way by Willem Meints

Willem writes an excellent post about how many of the current problems with deploying and using LLMs are similar to previous problems with other kinds of machine learning models.

Some excerpts:

  • MLOps in general: Getting a machine learning model in production takes a fair few extra steps after youโ€™ve trained one in a notebook.
  • Version tracking: Tracking the version of the LLM you used in production is important because the default API model may change. As in you were using gpt-4-0125 and it performed well. But the latest update breaks a few of your tasks.
  • Evaluation: There are many parts of the LLM pipeline such as retrieval, prompt templating, parameters to the LLM (e.g. temperature, top_p) and output parsing. While you may not own the LLM you can still monitor the steps around it. To see how your model is going in the real world, you can evaluate production data. But obtaining high-quality production data for LLMs can be hard. Many people donโ€™t want their conversations stored. Evaluations are an ongoing process.

5. Fine-tuning a Vision-Language Model (VLM) to generate fashion product descriptions

The AWS team share details on how they fine-tuned a BLIP-2 model to generate fashion descriptions for a product image. This is a great way to enrich a website with metadata information on products which is helpful for search engines and search in general.

Thereโ€™s also some great insights on the combination of using Hugging Face Transformers (for baseline models), Accelerate (for faster model training) and PEFT (parameter efficient fine-tuning for quicker model customization) in combination.

fine-tune-vlm-for-fashion-ML-16079-architecture

AWSโ€™s example fine-tuning and deployment pipeline for a model which is capable of generating product descriptions given an image of an item of clothing. Source: AWS tech blog.

6. How to fine-tune LLMs in 2024 by Phil Schmid

Philipp Schmid, Technical Lead at Hugging Face shares code examples and a walkthrough on fine-tuning your own LLM in 2024.

Inside youโ€™ll find:

  • Use case development
  • Dataset creation
  • Model fine-tuning
  • Model evaluation
  • Model serving

For a recent concrete example, Iโ€™d also recommend checking out Philโ€™s article on how to fine-tune Google Gemma.

Heck, check out all of Philโ€™s blog in general, itโ€™s full of ML epicness.

7. Two time series foundation models

Google and Salesforce have released two open-source deep learning based time series foundation models (what a mouthful!).

TimesFM by Google is a Transformer Decoder-only (similar to an LLM but for time series) trained on 100B time series points from a variety of sources. The model is 200M parameters so it can be run on quite small GPUs.

Despite its small size, the model performs very well in a zero-shot setting across a wide range of benchmarks, even against models which have been explicitly trained on the target data.

Code is available on GitHub and the TimesFM model is available on Hugging Face.

Moirai by Salesforce comes in 3 sizes, small (14M parameters), base (91M parameters) and large (311M parameters).

These models have been trained on 29B data points across 9 different domains including energy, transport, climate, sales, healthcare and more.

Code is available on GitHub and the dataset and models available on Hugging Face.

Open-source ๐Ÿ“–

marigold-depth-model

Example of the Marigold depth-estimation model. Source: Marigold Hugging Face demo.

Papers and Research ๐Ÿ”ฌ

Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach

Researchers at Meta, INRIA, Universiteฬ Paris Saclay and Google show how to make highly-curated datasets automatically from large-scale datasets with hierarchical k-means clustering.

As in, you could take a large corpus of images from the internet of 1+ billion samples and then filter it down to the target domain youโ€™re looking to work with, say 10+ million images of cars.

This highly curated dataset could then be used for self-supervised learning or supervised training.

See the code on GitHub.

herirarchical-k-means

Outline of hierarchical k-means filtering workflow for curating large-scale datasets into target datasets. Source: Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach paper.

ODGEN: Domain-specific Object Detection Data Generation with Diffusion Models

Researchers at Apple show how you can fine-tune diffusion models to generate images with objects similar to your own dataset. Detection models trained with real and synthetic data received a boost of up to 25.3% mAP@.50:.95.

Synthetic data has come a long way over the past few years. And this shows how itโ€™s expanding to even more specific tasks such as object detection.

Screenshot 2024-05-31 at 5.46.48โ€ฏPM

Qualitative example of Appleโ€™s ODGEN method for generating data-specific items in bounding boxes. Source: Apple Machine Learning blog.

An Introduction to Vision-Language Modeling

Several researchers across many institutions such as Meta, Universitรฉ de Montrรฉal, McGill University, University of Toronto, MIT and more have collaborated to create a comprehensive introduction to Visual-Language Modeling (VLMs).

The introduction is designed to be a โ€œstart hereโ€ for many of the concepts in VLMs such as different types of VLMs, how train VLMs, which VLM should you use?, evaluating VLMs, extending VLMs from images and text to videos and more.

If youโ€™re looking to learn more about how LLMs can be bridged to vision data, this is an excellent place to start.

Releases and announcements ๐Ÿ“ฃ

Presentations, Courses and Tutorials ๐Ÿ“ฝ

Vicky Boykis is one of my favourite voices in the world of machine learning. And her recent talk at PyCon Italia is filled with excellent advice. The main one being when thinking of building a feature with machine learning or AI, get closer to the metal (as the metal that your code is running on).

How?

  • Pick a single task.
  • Pick a measurable goal.
  • Pick the smallest piece of code you reproduce locally.

how-to-stay-closer-to-the-metal

How to stay closer to the Metal by Vicky Boykis. Get to the smallest possible piece you can and make it reproducible, then scale it up. Source: Vicky Boykis blog.

And start there.

Another great takeaway I enjoyed was the philosophy of Unix quote by Doug McIlroy, โ€œMake each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new features.โ€

Finally, what would you do if you had 1000 interns? Thatโ€™s a way you can think of LLMs. Are there tasks youโ€™d delegate to 1000 smart interns? Maybe theyโ€™d be suitable for LLMs.

You can watch the full talk on YouTube (it starts at 8:25:16 in the video).

See you next month!

What a massive month for the ML world in May!

As always, let me know if there's anything you think should be included in a future post.

In the meantime, keep learning, keep creating, keep dancing.

See you next month,

Daniel

www.mrdbourke.com | YouTube

By the way, I'm also an instructor with Zero To Mastery Academy teaching people Machine Learning & AI in the most efficient way possible. You can see a few of our courses below or check out all Zero To Mastery courses.

More from Zero To Mastery

The No BS Way To Getting A Machine Learning Job preview
The No BS Way To Getting A Machine Learning Job

Looking to get hired in Machine Learning? Our ML expert tells you how. If you follow his 5 steps, we guarantee you'll land a Machine Learning job. No BS.

6-Step Framework To Tackle Machine Learning Projects (Full Pipeline) preview
6-Step Framework To Tackle Machine Learning Projects (Full Pipeline)

Want to apply Machine Learning to your business problems but not sure if it will work or where to start? This 6-step guide makes it easy to get started today.

Python Monthly Newsletter ๐Ÿ’ป๐Ÿ preview
Python Monthly Newsletter ๐Ÿ’ป๐Ÿ

54th issue of Andrei Neagoie's must-read monthly Python Newsletter: 100x Python, Python 3.13, and much more. Read the full newsletter to get up-to-date with everything you need to know from last month.