[May 2023] Machine Learning Monthly Newsletter 💻🤖

41st issue! If you missed them, you can read the previous issues of the Machine Learning Monthly newsletter here.

Hey there, Daniel here.

I’m a Machine Learning Engineer who also teaches the following beginner-friendly machine learning courses:

I also write regularly about machine learning on my own blog as well as make videos on the topic on YouTube.

Since there's a lot going on, the utmost care has been taken to keep things to the point.

Enough about me! You're here for this month's Machine Learning Monthly Newsletter.

Typically a 500ish (+/-1,000ish, usually +) word post detailing some of the most interesting things on machine learning I've found in the last month.

What you missed in May 2023 as a Machine Learning Engineer…

My work 👇

PyTorch 2.0 videos + tutorials now live on ZTM — Over 3 hours of new PyTorch 2.0 focused content is now available via the ZTM PyTorch course. See the code on learnpytorch.io.

From the Internet

Making AIs more human-like with Reinforcement Learning from Human Feedback (RLHF) 🔄

With the continuing rise of AI-assistants such as OpenAI’s ChatGPT, Google’s Bard, Microsoft’s Bing, Anthropic’s Claude and OpenAssistant, there’s a secret sauce behind many of them that makes them so useful.

It’s called Reinforcement Learning from Human Feedback or RLHF for short.

In the context of chat, the whole goal of RLHF is to get the model (a large language model or LLM) to output responses that humans like.

Of course, correctness, non-bias and factual outputs are important but often when making a product, the perception of it being good is just (sometimes more) as important as it being correct.

It breaks down into four phases (or three, depending where you look) each of which build on top of each other.

Key:

🤖 = mostly automated inputs, e.g. raw internet data
😄 = mostly human-generated inputs, e.g. curated input/output pairs

Steps:

🤖 Pretraining (get a base model) — Train a large language model (LLM) on an internet worth of text data. For example, GPT-X models try to predict the next word in a sentence given every other word. In the sentence, “the dog jumped over the _____”, which is more likely, “fence” or “elephant”? It turns out when you do this with enough data, it creates a very good baseline model.
😄 Supervised fine-tuning (SFT) — Here's where you take the base model and fine-tune it to output ideal responses, such as dialogue-focused text, given an input. The training data here is high-quality input and output pairs (usually created by contractors or interns or scraped from the internet).
😄 Training a reward model — Once you've got your fine-tuned model, to start getting it more towards human preferences, you can train a reward model. This usually involves generating a number of outputs for the same input (e.g. 1 input → 5 outputs) and then having a person rank the outputs in order of their preferences (e.g. 5 = best, 1 = worst). Why order? Because ranking is a far less intensive task than generating new ideal outputs from scratch. Note: this step assumes your model from 1 and 2 is already generating pretty good output.
🤖😄 Fine-tune the LLM with reinforcement learning (RL) — Here's where the model from step 2 will continually generate outputs with the goal of maximising the reward model from step 3. For example, it will take into consideration the reward value for a certain kind of output and try to make sure its outputs get as close to the highest reward value as possible. In essence, the objective is to generate responses that a person would rank as the "best".

gpt-assistant-training-pipeline-IMG 5655

How to get to ChatGPT, start with a base large language model then tune it to be helpful and follow a dialogue-like structure.

Source: State of GPT talk by Andrej Karpathy.

A vast majority of the compute required for training these kinds of models is in the pretraining step (step 1), with some sources quoting 98%+.

This is because the input to the base model is often raw internet data (and lots of it), this is what enables a language model to get a “model of language” (understanding what word should come where).

The following steps require far less data in comparison (10,000s samples vs billons+).

If all of this sounds fairly straightforward, it’s because new technologies often sound straightforward in retrospect.

It’s still kind of magic that it works at all.

If you’d like to learn more about how RLHF is applied from several different points of view, I’d recommend the following resources:

RLHF: Reinforcement Learning from Human Feedback by Chip Huyen — A fantastic overview of RLHF as a 3 step process (step 1 and step 2 from above merged) along with mathematical and paper references.
RLHF.md by Joao Lages — An excellent step-by-step breakdown of the four steps above with examples of what a reward model creation step would look like.
State of GPT talk by Andrej Karpathy — This talk would probably be titled “How to train and use ChatGPT” because it goes through several different concepts relating to GPT’s with a focus on ChatGPT and GPT-4. I took plenty of screenshots throughout the talk. Sometime to note: LLaMa (the research-only, open-source LLM by Meta) got several mentions throughout the talk.
How to create a high quality dataset for RLHF with Label Studio by Jimmy Whitaker — Jimmy Whitaker shares a setup with the open-source tool Label Studio to create your own human feedback dataset.

why-rlhf-IMG 5658

Why is RLHF so effective? One of the reasons is because it’s easier for most people to rank things in order of preferences rather than to recreate something better from scratch. So you gather much more data on ranking than you can on creating new better outputs from scratch.

Source: State of GPT talk by Andrej Karpathy.

Google I/O 2023 AI and TensorFlow Updates

Google had their major release event, Google I/O 2023 at the start of May.

And as you could imagine there was a big focus on generative AI, with the release of new PaLM 2 APIs (Google’s version of GPT-4, still behind a waitlist at the time of writing).

You can read the full I/O recap on the TensorFlow blog as well as dedicated TensorFlow updates on the TensorFlow blog.

I’ve collected three (plus or minus a couple) of my favourite updates:

MediaPipe gets turned up — MediaPipe is a new (updated) framework for on-device machine learning, building on top of TensorFlow Lite (a little confusing, yes). But there are now plenty more ready-to-go solutions for running ML models on smaller devices (the benefits here are of course: speed, lower and privacy). Read the MediaPipe docs, watch the video demo.
KerasCV and KerasNLP provide easy access to state-of-the-art computer vision and NLP — Keras already provided templates for accessing incredible models but now two new dedicated libraries provide even easier access to battled-tested models such as EfficientNetV2 in KerasCV and BERT in KerasNLP. See KerasCV docs, KerasNLP docs, watch the demo video.
(Coming soon) TensorFlow Quantization API — Quantization is the process of maker your model smaller and faster (sometimes with a small hit to performance). And coming later this year, TensorFlow plans to integrate a dedicated quantization API, making it incredibly easy to make sure your model is trained with quantization in mind (see the code example below). Watch the video demo (quantization API starts towards the backend of the video).

import tensorflow as tf

# Create model
model = ...

# Setup quantization-aware model (coming soon)
tf.quantization.apply_quantization_on_model(model, config_map, ...)

# Compile, fit and save model as normal
model.compile(...)
model.fit(...)
model.save(...)

# Export model to TFLite
converter = tf.lite.TFLiteConverter.from_keras_model(model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
tflite_model = converter.convert()

Some Guidance on Prompt Engineering 🔍

Prompt engineering is the art and science of getting a large language model to do what you want.

In essence, how can you engineer your input (prompt) to an LLM to give you an ideal output?

The following resources are a great collection of techniques and tools to help with prompt engineering:

Tutorial: The Beginners Guide to Prompt Engineering — Nice little 10 minute intro tutorial from the ZTM team's Youtube channel. Start here.
Blog post: You probably aren’t doing prompt engineering correctly — The author claims this post could be called “features missing from LLMs front ends that should exist) and demonstrates with words and code several techniques such as prompt alternating, prompt editing, prompt weighting, prompt blending and prompt fusion.
Guide: BrexHQ’s guide to prompt engineering — A comprehensive guide of many of the fundamental and emerging techniques of prompt engineering. A must read for anyone wanting to get up to speed fast. One of my favourite's is the HyDE technique for semantic search (pictured below).
Library: Guidance — An open-source library from Microsoft for structured prompting inputs and outputs. For example, fill a JSON based on existing fields but don't recreate the whole thing, only fill the JSON.

hyde-Hypothetical-Document-Embedding-technique

An excerpt from BrexHQ’s prompt engineering guide on how to use LLMs for semantic search via the HyDE (Hypothetical Document Embedding) technique. If the query is small, ask the model to invent a document related to the query (even an imaginary one) and then search for the embedding of the hypothetical document.

For and against LLMs ⚖️

With all the talk of LLMs lately, it can be hard to understand exactly what and how they can be used for business.

As in, does your business have a use case?

Or is it just hype that in the end is a distraction?

Well, the following resources can help answer those questions:

Course: LLM Bootcamp by Full Stack Deep Learning — An online course dedicated to showcasing use cases of different LLM functionality both conceptually and practically. From prompt engineering to launching and LLM app in one hour. P.S. Word on the street is that ZTM is launching some new LLM and AI courses very soon... make sure to subscribe to the newsletter to find out when they drop 😉
Blog post: Against LLM maximalism by Matthew Honnibal (founder of spaCy) — LLMs can seem like the thing to use. After all, you just tell the model what to do and it'll do it right? For many cases, yes, but in many cases, it's probably overkill. In a terrific blog post, Matthew Honnibal (founder of the epic NLP library spaCy) shares how for the majority of current business use cases, LLMs are likely far too expensive and slow to use. Though they're excellent for initial prototyping, if you want something more reliable, you can probably achieve similar or better or faster results with a supervised model on a single GPU.

get-a-job-in-ml

Some tips for using LLMs in practice by Matthew Honnibal, see more in the full blog post: Against LLM maximalism.

Quick fire round 🔥

ImageBind = six modalities into one — ImageBind is a multi-modality model from Meta that learns depth, image, text, audio, heat map and inertial measurement units (IMU) in one embedding space. A big step forward into the world of AR/VR representations. Code on GitHub, blog post.
Intuition on the attention mechanism by Eugene Yan — The attention mechanism (and variations of it) is the operation that's the backbone of the Transformer architecture, the architecture that powers many of the latest LLMs. Eugene Yan explains how one might think about the different terms (query, key, value) in a way that makes it easier to comprehend.
BLOOMChat is an open-source multilingual LLM by SambaNova Systems that takes Hugging Face’s open-source 176B parameter BLOOM model and gives it chat-like capabilities. In a study, people favour BLOOMChat’s outputs 44% of the time versus GPT-4 (almost 50/50).
Run the transformers package from Hugging Face in the browser with the recently updated transformers.js v.2.0. Tweet, GitHub.
Yann LeCun (VP & Chief AI Scientist at Meta) goes on the 20VC podcast and discusses why he thinks AI will create more jobs than it replaces and why it won’t take over humanity. One of my favourite listens for the year so far on AI. Listen on Spotify, Apple Podcasts.
Tiny corp, creators of tinygrad (a neural network framework, like PyTorch but much much smaller) is now a company. And their goal is to get neural networks to run fast almost anywhere (starting with a focus on AMD chips to give NVIDIA some much needed competition). There is currently a bunch of bounties going on their website to implement functionality into the library & get paid!!! Start the job before you have it! Read the blog post announcement.

See you next month!

What a massive month for the ML world in May!

As always, let me know if there's anything you think should be included in a future post.

Liked something here? Leave a comment below.

In the meantime, keep learning, keep creating, keep dancing.

See you next month,

Daniel

www.mrdbourke.com | YouTube

By the way, I'm a full-time instructor with Zero To Mastery Academy teaching people Machine Learning in the most efficient way possible. You can see a few of our courses below or check out all Zero To Mastery courses.