43rd issue! If you missed them, you can read the previous issues of the Machine Learning Monthly newsletter here.
Hey there, Daniel here.
I’m a Machine Learning Engineer who also teaches the following beginner-friendly machine learning courses:
I also write regularly about machine learning on my own blog as well as make videos on the topic on YouTube.
Since there's a lot going on, the utmost care has been taken to keep things to the point.
Enough about me! You're here for this month's Machine Learning Monthly Newsletter.
Typically a 500ish (+/-1,000ish, usually +) word post detailing some of the most interesting things on machine learning I've found in the last month.
ZTM State of AI Tools and Coding: 2023 Edition — 3,240 developers answered questions about their use of AI tools (think GitHub Copilot, ChatGPT, Bard & more) and the results are in! Some of my favourites:
Percentage of developers from different regions who use AI tools. Source: ZTM 2023 State of AI Tools & Coding.
Check out this thread for some more quick hitting insights:
How are programmers using AI in 2023 🤔?
— Zero To Mastery (@zerotomasteryio) August 1, 2023
We surveyed 3,240 of them to find out.
No wasting time, let's dive right into some of the fascinating insights 👇 pic.twitter.com/JNMQEjV9bv
Heard about embeddings? Wondering which ones you should use?
The Rabbit Hole Syndrome YouTube has an excellent deep-dive on which embedding are best (paid and free) and shows an end-to-end example of using them in a web application.
Following on from the video above, MTEB is a leaderboard that compares the best text embedding models across a range of metrics, including model size, sequence length, embedding dimensions and more.
A snippet of the MTEB leaderboard. Many of the best text embedding models are available free to download on Hugging Face. Source: Hugging Face Massive Text Embedding Benchmark.
Compare many of the latest state-of-the-art language models and their variants across a wide range of metrics such as throughput and average score across a suite of benchmarks.
Rob Mulla shares how you can combine an open-source LLM, an open-source chat interface and your own data to chat with your own documents (all with the privacy of your own machine).
One of my favourite things to stumble upon is a series of tricks that someone has found and shared through their own experimentation.
Alistair Pullen’s code search company went viral which is good but expensive, so they had to figure out a way to make their results better. And they discovered a few tricks along the way: custom embeddings (e.g. take pre-made embeddings and adjust them for your own setting), making the question look like the answer (HyDE or hypothetical document embedding), meta-characteristic search (create descriptions for items and search for those too) and resilient embeddings (even with only 40% of a piece of code embedded, searches are still ok).
For a fast model, try an SVM on top of embeddings (e.g. OpenAI) with a few hundred labelled examples. Source: Mark Tenenholtz Twitter.
Photo generated with Stable Diffusion XL 1.0, prompt: Cool cover photo for a machine learning newsletter. Old school magazine style. Colourful robot on the front.
curated-transformers
is an open-source library by the team behind spaCy (incredible natural language processing library) for high-quality and reproducible Transformer model code. Because of its modularity, it’s also a great education source for anyone looking to create their own Transformers. See it on GitHub, read the release Tweet.Combining vision, language and actions to create Robotic Transformer 2 (RT-2)
DeepMind’s latest research shows how you can improve robotic actions from natural language (e.g. “put the strawberry into the bowl”) by combining vision and language models (VLMs) with robotic action data to create vision-language-action models (VMAs).
The research shows you can treat robotic action sequences (such as “rotate X, rotate Y…”) as sequences and then pass them as tokens to a language model. Doing so resulted in an up to 3x improvement over RT-1 and much more generalisation capabilities.
Turning robot actions into a sequence that can be modelled by a large language model. What can’t be turned into a language? Source: DeepMind Blog.
Less but better: getting terrific language model results by fine-tuning on only 1000 curated samples
Researchers show in LIMA: Less Is More for Alignment that with 1000 high-quality curated prompts and responses, you can get an open-source LLM (LLaMa 1 65B) to perform as on par with models such as GPT-4 (up to 43% preference rate).
Instead of training a model with RLHF (reinforcement learning with human feedback), the researchers achieve their results by just fine-tuning the initial weights of the LLM on the high-quality data for 15 epochs. This shows that an incredible amount of knowledge is learned in model pretraining.
There are some limitations though, LIMA was found to be less robust than productised systems such as GPT-4, Claude and Bard and thus more open to adversarial prompts.
It makes me wonder how it would perform when/if Llama 2 is used 🤔. I’m also thinking about how I could carefully craft a dataset for Nutrify in the image domain.
KNN + Gzip beats deep learning models on text classification (or does it?)
There was a paper recently that shared how KNN + Gzip compression (yes, the gzip module in Python) can potentially beat deep learning models for text classification across a wide-range of datasets.
However, thanks to the beauty of the ML community, it turns out there may be a few bugs in the code causing the results to be so good, the main one being: data leakage (test data leaking into the training data, a mistake we’ve all made).
Sebastian Raschka has a brilliant write up about the implementation as well as the bug in his newsletter Ahead of AI.
BLIP meets diffusion!
BLIP-Diffusion brings multi-modal text-and-subject control to diffusion models. As in, you can create subject-driven images (e.g. provide a subject and then generate images based on the subject) as well as subject-driven editing.
All of this happens up to 20x faster than previous methods with incredible results.
Two of these images are real and original. The others are generated based on the subject in the real image. Can you guess which ones? Source: BLIP-Diffusion website.
Automated LLM attacks
You may have seen prompt-injections such as “DAN” or “do anything now” which are designed to get a large language model such as GPT-4 to produce outputs that may be unfavourable (such as producing the instructions to create a bomb).
In the paper, Universal and Transferable Adversarial Attacks on Aligned Language Models researchers find a way to automate such attacks to find LLMs to output whatever they want (effectively bypassing the safety checks).
They shared their work with private companies before publishing it (so the hacks they found have been patched) but it doesn’t mean there are more out there. Good to see this get into the open though, as with the current wave of AI, it seems the more public awareness the better.
LLM-Attacks website / Paper / GitHub.
RepViT is a really fast CNN for mobile devices
My brother and I are building Nutrify, an iOS app to take a photo of food and learn about it. So this comes as a really exciting release.
RepViT takes the learnings from the Vision Transfomer (ViT) space and applies them to the CNN (Convolutional neural network) space for mobile architectures (e.g. MobileNetV3).
It can perform at 78.5% to 81.4% top-1 accuracy on ImageNet at a 0.9ms to 1.3ms latency on an iPhone 12 (~1000 inferences per second!).
Models are available in timm and on GitHub.
Three cool features and tips from Cohere (a large language model company):
Stack Overflow releases OverflowAI an updated way to search the internet’s most comprehensive developer knowledge base. Blog post / Video.
Apple shares how they discover places of interest in Photos for the Memories feature (e.g. shots of significant locations) whilst maintaining privacy.
A guide on how to manage hallucinations (making things up) in LLMs by Sascha Heyer. The trick? Use retrieval augmented generation (RAG).
What a massive month for the ML world in July!
As always, let me know if there's anything you think should be included in a future post.
In the meantime, keep learning, keep creating, keep dancing.
See you next month,
Daniel
By the way, I'm also an instructor with Zero To Mastery Academy teaching people Machine Learning in the most efficient way possible. You can see a few of our courses below or check out all Zero To Mastery courses.