[July 2023] Machine Learning Monthly Newsletter 💻🤖

43rd issue! If you missed them, you can read the previous issues of the Machine Learning Monthly newsletter here.

Hey there, Daniel here.

I’m a Machine Learning Engineer who also teaches the following beginner-friendly machine learning courses:

I also write regularly about machine learning on my own blog as well as make videos on the topic on YouTube.

Since there's a lot going on, the utmost care has been taken to keep things to the point.

Enough about me! You're here for this month's Machine Learning Monthly Newsletter.

Typically a 500ish (+/-1,000ish, usually +) word post detailing some of the most interesting things on machine learning I've found in the last month.

What you missed in July 2023 as a Machine Learning Engineer…

From ZTM ♾

ZTM State of AI Tools and Coding: 2023 Edition — 3,240 developers answered questions about their use of AI tools (think GitHub Copilot, ChatGPT, Bard & more) and the results are in! Some of my favourites:

ChatGPT is the most popular tool for developers by a long shot (for now)
Programmers in Asia and Africa have the highest daily and weekly usage of AI tools (nearly 80%+ vs ~60% in the US)
Programmers at smaller companies (2-20 people) are using ChatGPT the most (60-70%) but larger size companies aren’t far off (53-58%)
The most common use case for developers surveyed using ChatGPT is for “learning new topics” (80.5%) followed by “actually writing code” (58.5%)

AI Tool Usage by Geographic Region

Percentage of developers from different regions who use AI tools. Source: ZTM 2023 State of AI Tools & Coding.

Check out this thread for some more quick hitting insights:

How are programmers using AI in 2023 🤔?

We surveyed 3,240 of them to find out.

No wasting time, let's dive right into some of the fascinating insights 👇 pic.twitter.com/JNMQEjV9bv
— Zero To Mastery (@zerotomasteryio) August 1, 2023

From the Internet 💻

[Video] Which embeddings are best? ($0 embeddings vs paid)

Heard about embeddings? Wondering which ones you should use?

The Rabbit Hole Syndrome YouTube has an excellent deep-dive on which embedding are best (paid and free) and shows an end-to-end example of using them in a web application.

Watch on YouTube.

MTEB (Massive Text Embedding Benchmark) by Hugging Face

Following on from the video above, MTEB is a leaderboard that compares the best text embedding models across a range of metrics, including model size, sequence length, embedding dimensions and more.

text-embedding-leaderboard

A snippet of the MTEB leaderboard. Many of the best text embedding models are available free to download on Hugging Face. Source: Hugging Face Massive Text Embedding Benchmark.

Which LLM (Large Language Model) is best? The LLM leaderboards by Hugging Face can help

Compare many of the latest state-of-the-art language models and their variants across a wide range of metrics such as throughput and average score across a suite of benchmarks.

[Video] 100% offline ChatGPT

Rob Mulla shares how you can combine an open-source LLM, an open-source chat interface and your own data to chat with your own documents (all with the privacy of your own machine).

Tricks for productizing LLM outputs and creating custom embeddings

One of my favourite things to stumble upon is a series of tricks that someone has found and shared through their own experimentation.

Alistair Pullen’s code search company went viral which is good but expensive, so they had to figure out a way to make their results better. And they discovered a few tricks along the way: custom embeddings (e.g. take pre-made embeddings and adjust them for your own setting), making the question look like the answer (HyDE or hypothetical document embedding), meta-characteristic search (create descriptions for items and search for those too) and resilient embeddings (even with only 40% of a piece of code embedded, searches are still ok).

Bonus 1: For a quick and fast model, try an SVM trained on a few hundred labelled examples (initialized with embeddings).

svm-on-top-of-embeddings

For a fast model, try an SVM on top of embeddings (e.g. OpenAI) with a few hundred labelled examples. Source: Mark Tenenholtz Twitter.

Bonus 2: See the OpenAI cookbook for creating customized embeddings on top of existing embeddings.

Open-source releases

[Model] Stability AI release their latest model SDXL 1.0 — With a plethora of upgrades over previous models, the best open-source image generation model gets better. Read the blog post, get the code and model weights on GitHub, try the model on DreamStudio.

ml-monthly-robot-newsletter

Photo generated with Stable Diffusion XL 1.0, prompt: Cool cover photo for a machine learning newsletter. Old school magazine style. Colourful robot on the front.

[Course] Huge Made with ML updates — Made with ML by Goku Mohandas is one of my favourite resources for learning ML. I recommend it to everyone asking how to learn full-stack techniques for ML from designing a system to testing a system to launching a system. And there have been a bunch of updates on scaling, moving from dev → prod and more. I’m personally going through the testing and CI/CD workflow modules. Blog post, Tweet.
[Dataset] Objaverse 10M brings 10 million 3D objects into the open-source world (incoming 3D model generation)! Blog post on LAION.ai.
[Model] OpenFlamingo v2 — The LAION team have updated their suite of OpenFlamingo models (an interleaved image and text model) with 3B, 4B and 9B variants.
[Library] curated-transformers is an open-source library by the team behind spaCy (incredible natural language processing library) for high-quality and reproducible Transformer model code. Because of its modularity, it’s also a great education source for anyone looking to create their own Transformers. See it on GitHub, read the release Tweet.
[Model] Llama 2, the new biggest big dog open source large language model is live! Trained by the ML team at Meta, the model outperforms all other open source language models on plenty of different benchmarks. And the best news? It’s available for commercial use! What a huge release from Meta! Blog post, GitHub, paper, Hugging Face.

Papers and research 🔍

Combining vision, language and actions to create Robotic Transformer 2 (RT-2)

DeepMind’s latest research shows how you can improve robotic actions from natural language (e.g. “put the strawberry into the bowl”) by combining vision and language models (VLMs) with robotic action data to create vision-language-action models (VMAs).

The research shows you can treat robotic action sequences (such as “rotate X, rotate Y…”) as sequences and then pass them as tokens to a language model. Doing so resulted in an up to 3x improvement over RT-1 and much more generalisation capabilities.

turning-robot-actions-into-language

Turning robot actions into a sequence that can be modelled by a large language model. What can’t be turned into a language? Source: DeepMind Blog.

Less but better: getting terrific language model results by fine-tuning on only 1000 curated samples

Researchers show in LIMA: Less Is More for Alignment that with 1000 high-quality curated prompts and responses, you can get an open-source LLM (LLaMa 1 65B) to perform as on par with models such as GPT-4 (up to 43% preference rate).

Instead of training a model with RLHF (reinforcement learning with human feedback), the researchers achieve their results by just fine-tuning the initial weights of the LLM on the high-quality data for 15 epochs. This shows that an incredible amount of knowledge is learned in model pretraining.

There are some limitations though, LIMA was found to be less robust than productised systems such as GPT-4, Claude and Bard and thus more open to adversarial prompts.

It makes me wonder how it would perform when/if Llama 2 is used 🤔. I’m also thinking about how I could carefully craft a dataset for Nutrify in the image domain.

KNN + Gzip beats deep learning models on text classification (or does it?)

There was a paper recently that shared how KNN + Gzip compression (yes, the gzip module in Python) can potentially beat deep learning models for text classification across a wide-range of datasets.

However, thanks to the beauty of the ML community, it turns out there may be a few bugs in the code causing the results to be so good, the main one being: data leakage (test data leaking into the training data, a mistake we’ve all made).

Sebastian Raschka has a brilliant write up about the implementation as well as the bug in his newsletter Ahead of AI.

BLIP meets diffusion!

BLIP-Diffusion brings multi-modal text-and-subject control to diffusion models. As in, you can create subject-driven images (e.g. provide a subject and then generate images based on the subject) as well as subject-driven editing.

All of this happens up to 20x faster than previous methods with incredible results.

Website / Paper / Code.

blip-diffusion-images

Two of these images are real and original. The others are generated based on the subject in the real image. Can you guess which ones? Source: BLIP-Diffusion website.

Automated LLM attacks

You may have seen prompt-injections such as “DAN” or “do anything now” which are designed to get a large language model such as GPT-4 to produce outputs that may be unfavourable (such as producing the instructions to create a bomb).

In the paper, Universal and Transferable Adversarial Attacks on Aligned Language Models researchers find a way to automate such attacks to find LLMs to output whatever they want (effectively bypassing the safety checks).

They shared their work with private companies before publishing it (so the hacks they found have been patched) but it doesn’t mean there are more out there. Good to see this get into the open though, as with the current wave of AI, it seems the more public awareness the better.

LLM-Attacks website / Paper / GitHub.

RepViT is a really fast CNN for mobile devices

My brother and I are building Nutrify, an iOS app to take a photo of food and learn about it. So this comes as a really exciting release.

RepViT takes the learnings from the Vision Transfomer (ViT) space and applies them to the CNN (Convolutional neural network) space for mobile architectures (e.g. MobileNetV3).

It can perform at 78.5% to 81.4% top-1 accuracy on ImageNet at a 0.9ms to 1.3ms latency on an iPhone 12 (~1000 inferences per second!).

Models are available in timm and on GitHub.

🔥 Quick fire: cools blog posts and case studies

Three cool features and tips from Cohere (a large language model company):

Rerank search items based on relevance to query with Cohere rerank (e.g. start with top 100 searches and then use a large language model to rank them in order of best to worst)
Prompt templating guide from Cohere (how do you get a large language model to do what you want?)
Different ideas to use generative text LLMs for (e.g. structuring, drafting, summarising)

Stack Overflow releases OverflowAI an updated way to search the internet’s most comprehensive developer knowledge base. Blog post / Video.

Apple shares how they discover places of interest in Photos for the Memories feature (e.g. shots of significant locations) whilst maintaining privacy.

A guide on how to manage hallucinations (making things up) in LLMs by Sascha Heyer. The trick? Use retrieval augmented generation (RAG).

See you next month!

What a massive month for the ML world in July!

As always, let me know if there's anything you think should be included in a future post.

In the meantime, keep learning, keep creating, keep dancing.

See you next month,

Daniel

www.mrdbourke.com | YouTube

By the way, I'm also an instructor with Zero To Mastery Academy teaching people Machine Learning in the most efficient way possible. You can see a few of our courses below or check out all Zero To Mastery courses.