[November 2023] AI & Machine Learning Monthly Newsletter 💻🤖

47th issue! If you missed them, you can read the previous issues of my monthly A.I. & Machine Learning newsletter here.

Hey there, Daniel here.

I’m an A.I. & Machine Learning Engineer who also teaches the following beginner-friendly machine learning courses:

I also write regularly about A.I. and machine learning on my own blog as well as make videos on the topic on YouTube.

Since there's a lot going on, the utmost care has been taken to keep things to the point.

Enough about me! You're here for this month's A.I. & Machine Learning Monthly Newsletter.

Typically a 500ish (+/-1,000ish, usually +) word post detailing some of the most interesting things on machine learning I've found in the last month.

What you missed in November 2023 as an A.I. & Machine Learning Engineer…

My Work 👇

Nutrify is live on the App Store! 🍍

My brother and I have been working on a food education app called Nutrify for a while. And now version 1.0 is live!

Nutrify allows you to take a photo of food and learn about it.

If you’ve done either of the ZTM TensorFlow or PyTorch courses, you would’ve worked on the Food Vision projects.

Nutrify is an extension of Food Vision and taking it to a fully deployed and functional application.

Nutrify is quick because it runs all the computer vision on device (the models are built with PyTorch and converted to Apple’s CoreML format).

For now, Nutrify can identify 420 foods (& mostly foods available in Australia 🇦🇺) but we’ll be adding many more in the future.

I’ve personally unlocked 333/420 foods with the app so far.

Let me know if you manage to beat it!

nutrify-app-store-screenshot-fullpage

Nutrify: The Food App live on the App Store — take a photo of food and learn about it! If you’ve got an iPhone with iOS 16.0+, search “nutrify” on the App Store and look for the pineapple.

From the Internet 🌐

Keras 3.0 is out!

Keras 3.0 is one of the biggest releases in machine learning in the past decade.

A famous question in ML is “should I learn PyTorch or TensorFlow?” (lucky ZTM has courses for TensorFlow and PyTorch)

But now the answer truly is “both if you want to”.

Why?

Because Keras 3.0 lets you use JAX, TensorFlow OR PyTorch as a backend.

So you can build a model in Keras and use it across any of the major frameworks.

A huge milestone for the machine learning community!

The example code snippet shows using PyTorch as the backend.

import numpy as np

import os

os.environ["KERAS_BACKEND"] = "pytorch"

# Note that Keras should only be imported after the backend
# has been configured. The backend cannot be changed once the
# package is imported.

import keras

# Keras code will run with PyTorch as the backend

How do I index my embeddings?

An embedding is a useful way to represent a data sample.

For example, if you have a sentence of text, you can use an embedding to represent it numerically. The same for images, audio files and other forms of data.

That numerical representation can then be used to compare it to other numerical representations.

This process is very achievable for a small number of samples (e.g. 100 to 10,000 and even 100,000).

However, once you start getting into the millions and billions of data samples, you’l likely want to create some kind of index on those samples to save on comparison time.

The workflow goes:

Data → Turn data into embeddings (choose algorithm) → Turn embeddings into index (choose index) → Search index to compare embeddings

An index is like a reference.

A flat index would mean you would search and compare one target sample to every other sample.

However, this could take quite a while if you have millions (or billions) of samples.

Instead, you can use an algorithm such as Hierarchical Navigable Small Worlds (HNSW) to create an index on your embeddings.

HNSW is a form of approximate search, meaning it won’t give you 100% of the best results (a flat index will), however, it will be much faster than searching over every sample.

Since I’ve been working with embeddings on a larger scale, I’ve been learning more and more about index creation algorithms like HNSW.

And I’ve found the following collection of resources invaluable:

[Blog post] Choosing the right index for similarity search by Pinecone (figuring out which index to use)
[Blog post/guide] How to perform semantic search at billions scale (a guide to creating a semantic search/embedding search system across 5 billion images)
[Library] Faiss: A library for efficient similarity search by Facebook Research (this will help you make the index)
[Library] Autofaiss: A library to create the best Faiss index with a given memory and speed constraints (create the best index automatically)
[Library] clip-retrieval: A library to create CLIP (Contrastive Language Image Pretraining) embeddings and then build a retrieval system on top of them (this is helpful because it combines text and images into a single numerical representation space)

different-index-types

A comparison of different indexes to use for embeddings and their results and runtimes. Source: Pinecone blog.

Identifying birds in the backyard with embeddings 🦜

If you want to see an example of a workflow to use embeddings to build an image/text search system, Multimodal Retrieval with Text Embedding and CLIP Image Embedding for Backyard Birds by Wenqi Glantz is a fantastic example.

Given a question about a bird, the system will turn the text into an embedding, find a relevant resource in a database and then return an image along with an answer.

A very cool project idea that would be fun to replicate except with a different data source.

bird-qa-image

Example of using a multi-modal QA model and it returning text and image based on a text query. Source: Wenqi Glantz blog.

Getting started with Llama 2 (quick!) 🛫

The Llama 2 Large Language Models (LLMs) from Meta are some of the best performing open-source LLMs available.

When they were first released, they were a bit of a challenge to get setup.

However, this has changed, thanks to following the resources:

5 steps to get started with Llama 2 by Meta AI - A quick guide by the creators of Llama (Meta) themselves on how to get Llama 2 running locally.
HelloLlama by Meta AI — A quickstart guide on running Llama 2 almost anywhere (cloud, locally and more).
llamafile by Mozilla and Justine Tunney — A program that lets you turn LLM weights into executable files runnable on various operating systems. See the GitHub and walk-through by Simon Willison for more. Bonus: Mozilla just released a guidebook on AI for beginners, it looks excellent.

running-llama.cpp-locally

Example of running llamafile locally on my MacBook Pro M1 Pro. I was shocked at how easy it was to setup and get going. It’s fast too.

The Little Book of Deep Learning

A free book on deep learning formatted to be read on a phone screen for those with a STEM (science, technology, engineering and mathematics) background.

I’ve got this bookmarked for my holiday reading!

lbdl4p Large

A few pages from The Little Book of Deep Learning by François Fleuret. Source: fleuret.org.

Generative AI for Beginners Course

A free course consisting of 12 lessons on generative AI for absolute beginners by Microsoft. All materials are available on GitHub.

Cool blogs and tutorials

Airbnb share how they developed a system to extract structured information from unstructured text across 6 different text-based data sources (better structured information = better search).
GPUs not available on the cloud? Antonis Makropoulos shares tips and tricks on how you can build your own multi-GPU deep learning PC in 2023.
GitHub Copilot was one of the first products to leverage LLMs in a paid application. In a recent blog post, the GitHub team share the layout of a modern day LLM application.

Overview of a modern day LLM-based application by GitHub. Like many machine learning apps, there’s a few more moving parts than just the model. Source: GitHub blog.

Does GPT-4 (and other LLMs) have ToM (Theory of Mind) capabilities? As in, do they have the ability to understand other people by assigning mental states to them (I copied this line from Wikipedia). Researchers from Al2 believe they don’t.
gpt-fast pushes the limits of native PyTorch trying to accelerate LLMs. Turns out with <1000 lines of pure, native PyTorch you can accelerate an LLM such as Llama-7B by 10x (25 tokens/s → 250 tokens/s). Get the code on GitHub.
Enrich your vision datasets with automatic tagging, object detection and segmentation. Say you’ve got a collection of 1000s or 100,000s different images and you’d like to assign tags to them? Visual Layer (creators of fastdup, an incredible computer vision library), have you covered with their data enrichment series.
Looking for a vector database to store all of your embedding vectors? Turns out Supabase ran some tests to find that pgvector (an open-source vector database for Postgres) is faster (and cheaper) than Pinecone (a close-source system).

Releases

Stable Diffusion XL Turbo (the fastest and best version of Stable Diffusion) is live and available on Hugging Face ready to generate images from text.
Cohere released their Embed v3 model with some of the best results on the Hugging Face MTEB leaderboard, however, it was later bettered (slightly) by Voyage AI Embeddings. Both models are proprietary and claim better results on real-world data than state-of-the-art open source methods.
X.ai (the AI company started with Elon Musk) released their first LLM called Grok. Grok is designed to answer questions (including the spicy ones) with a bit of wit and humor. It also has access to the X (Twitter) data so it provides more real-time results.
GitHub hosted their Universe 2023 conference and announced basically a huge overhaul to having Copilot everywhere. Almost enough to go from GitHub to CopilotHub. AI-powered everything! One of my favourite features so far is Copilot Chat, a GPT-4 powered chat interface built into VS Code to help you with your own codebase. See the video announcement.

Papers & Research

The DAMO Academy released mPLUG-Owl2, possibly the best available open-source multi-modal large language model (MLLM). GitHub, Paper, Demo on Hugging Face.
Is this text AI? Ghostbuster.app is an application, model and paper that shows that detecting text written by AI is possible to improve on but still hard.
DiffSeg is a paper by Google Research that looks at combining Stable Diffusion and Segmentation. It takes a Stable Diffusion model and its self-attention layers from different resolutions of the same image and iteratively merges the similar features to create a segmentation mask. DiffSeg shows powerful capabilities in the zero-shot segmenting space.
One of my favourite papers in the last month was Data Filtering Networks (or DFNs for short) by Apple Research. The authors figured out a way to take a really large dataset (e.g 43 billion image and text pairs) and filter it down to a smaller but still valuable dataset (43B → 5B image and text pairs). Then they trained a CLIP model on the filtered dataset and achieved the best results so far on for a zero-shot CLIP model (84.2% on ImageNet)! This is similar to MetaCLIP in the October 2023 issue of Machine Learning Monthly. How do you take a large dataset and customize it for your own problem? You filter it with heuristics or you filter it with another model. Even better, the DFN CLIP models by Apple are available on Hugging Face.
CapsFusion by BAAI research creates enriched image captions by fusing 1M synthetic and real image captions with ChatGPT and then fine-tuning a Llama model to mimic the fused caption style. Very cool results!
Google Research released two new papers and a blog post discussing two self-adaptive prompting techniques, Consistency-Based Self Adaptive Prompting (COSP) and Universal Self-Adaptive Prompting (USP). Each build on each other and each improve on zero-shot prompting results. I find it fascinating how many new kinds of prompting techniques are coming out every couple of weeks/months. It goes to show how much we're still finding out about LLMs.
RAM++ (Recognize Anything) with incredible results on zero-shot image tagging (even better than supervised models in some cases!). Get the code and model on GitHub, read the paper.
Apple Research release EELBERT, a tiny version of BERT capable of creating high-performing embeddings on device. The UNO-EELBERT model achieves a GLUE score within 4% of a fully trained BERT-tiny model whilst being 15x smaller (only 1.2MB!).

Videos

The AI Conference was held recently and two talks (there were many) I picked out were:

From Mosquitoes to the Inside of Trash Bins by Danny Bickson — An excellent talk on the power of computer vision and its still untapped use-cases.
Practical Data Considerations for Building Production-Ready LLM Applications by Jerry Liu — Jerry Liu is the founder and CEO of LlamaIndex (a library which adds many useful features to LLMs) and in this talk he shares his practical tips on RAG-based applications.
Intro to Large Language Models by Andrej Karpathy — Possibly the best 1 hour talk on LLMs on the internet right now. I thoroughly enjoyed watching this.
Ever wonder how an object detection model knows where something is in an image? I'd highly recommend the classic object detection and localization series by Andrew Ng on the DeepLearningAI YouTube channel.

object-localization

An overview of image classification, object localization (one object) and object detection (multiple objects). Source: Slide from Andrew Ng’s video on object localization.

See you next month!

What a massive month for the ML world in November!

As always, let me know if there's anything you think should be included in a future post.

In the meantime, keep learning, keep creating, keep dancing.

See you next month,

Daniel

www.mrdbourke.com | YouTube

By the way, I'm also an instructor with Zero To Mastery Academy teaching people Machine Learning & AI in the most efficient way possible. You can see a few of our courses below or check out all Zero To Mastery courses.