44th issue! If you missed them, you can read the previous issues of the Machine Learning Monthly newsletter here.
Hey there, Daniel here.
I’m a Machine Learning Engineer who also teaches the following beginner-friendly machine learning courses:
I also write regularly about machine learning on my own blog as well as make videos on the topic on YouTube.
Since there's a lot going on, the utmost care has been taken to keep things to the point.
Enough about me! You're here for this month's Machine Learning Monthly Newsletter.
Typically a 500ish (+/-1,000ish, usually +) word post detailing some of the most interesting things on machine learning I've found in the last month.
A new set of state-of-the-art (SOTA) open-source text embeddings are available on Hugging Face.
These new embeddings called bge
(BAAI General Embeddings) and gte
(General Text Embeddings) outperform previous sets of embeddings (including OpenAI’s) with smaller model sizes and less embedding dimensions.
Wait, what are embeddings?
Embeddings are a numerical representation of some kind of data.
In this case, we’re talking about text embeddings.
In essence, a numerical representation of a passage of text (e.g. “The quick dog ran up the hill” → [0.3, 0.2, 0.6 …]
.
Embeddings can be used for many different use cases such as:
And much more.
Four (maybe more?) huuuuugeeee new model releases from Meta in the past month:
Some of the models vary on the licences they’re released with (e.g. commercial or research-only) but it’s outstanding to see such efforts released in such a way.
Keep at it Meta!
Supabase is an open-source Firebase alternative Postgres database.
It can be used for storing structured data (SQL) in an efficient way.
My brother and I are using it to store account data and metadata for our app Nutrify.
Supabase recently allowed storing of vectors via pgvector (vectors can be embeddings, numerical representations of data, such as discussed above).
And now they’ve partnered with Hugging Face to create those vectors via Transformers models (such as the new state-of-the-art open-source embedding models).
In essence, you can now go from data → transform to vector (embedding) → store in database → query later, all with Supabase and Hugging Face, two open-source tools!
vecs
:import vecs
from vecs.adapter import Adapter, ParagraphChunker, TextEmbedding
vx = vecs.create_client("postgresql://<user>:<password>@<host>:<port>/<db_name>")
# Create a new collection with an associated adapter
docs = vx.get_or_create_collection(
name="docs",
# here comes the new part
adapter=Adapter(
[
ParagraphChunker(skip_during_query=True),
TextEmbedding(model='Supabase/gte-small'),
]
)
)
# Upsert
docs.upsert(
records=[
(
"vec0",
"the diameter of a 747 ...", # <- inserting text!
{"publish_year": 2019}
)
]
)
# Search by text
docs.query(data="how many ping pong balls fit in a Boeing ...")
# Results: [...]
Robowflow Supervision is an open-source library to help with your computer vision pipelines.
It includes tools to:
It goes along with Roboflow’s fantastic suite of computer vision tools, such as Notebooks (for computer vision examples), Autodistill (automatic computer vision data labelling) and Collect (automatic collection of computer vision data at different intervals).
Meta released the Segment Anything (SAM) model a couple of months ago now.
Since then there’s been a flourishing of open-source variants to both improve its capabilities and speed.
Two of the latest are HQ-SAM and Light HQ-SAM, where HQ stands for High Quality and Light stands for smaller and faster.
HQ-SAM turns the segmentation masks from SAM into higher quality segmentation masks, improving the results and making the masks look overall better.
Light HQ-SAM is a distillation (a way to use what bigger models know and embed it into smaller models) of HQ-SAM which allows real-time segmentation due to having less parameters (40MB vs 5-10GB).
Both are open-source an available via the sam-hq GitHub repo.
Visual comparison of SAM vs SAM-HQ results. Source: sam-hq GitHub.
Keras Core is one of the most exciting things I’ve seen in deep learning frameworks since Hugging Face’s transformers
.
The goal of the project is to combine the usability of Keras with the backends of the major frameworks.
As in, it allows you to code your machine learning code in Keras (simple and easy to use) and have it run with TensorFlow, PyTorch OR JAX as the backend.
The big benefit here is you could build and train a model in Keras Core and then integrate your favourite functions from other frameworks.
Or you can find a model trained with PyTorch, import it into your Keras Core environment and have it run without hiccups.
Keras Core will eventually become Keras 3.0 and already has a bunch of functionality ready to use (e.g. many of the existing operations across PyTorch, TensorFlow and JAX, including data loading pipelines all work within Keras Core).
If successful, Keras 3.0 could mean that there’s no longer the question of JAX, TensorFLow or PyTorch…? Because with Keras 3.0, you get any and all of them. Source: Keras Core announcement post.
This is easily one of my favourite papers I’ve read in a long time.
Reinforce Data, Multiply Impact: Improved Model Accuracy and Robustness with Dataset Reinforcement discusses how to improve the results of a model on a dataset by making the dataset better instead of making the model better.
Figure of the dataset reinforcement workflow. Source: Reinforce Data, Multiply Impact paper.
With data reinforced ImageNet (ImageNet+), the researchers were able to train an equivalent ResNet-50 model in 2.5x less time compared to knowledge distillation (KD) training.
And compared to standard ImageNet training, with ImageNet+ the researchers achieved the same results in 150 epochs vs 1000 epochs (6.9x faster) and with the same amount of training time (1000 epochs) achieved a 1.7% better result with the same architecture.
The results also lent themselves to a variety of architectures (CNN, ViT, MobileNet), datasets (Food101, Flowers102) and problem spaces (object detection and segmentation).
One of the main benefits is that the act of dataset reinforcement only needs to take place once, then the reinforced dataset can subsequently be used for later training with minimal overheads.
The authors also found that training on reinforced datasets made the models far more robust to out of distribution (OOD) datasets such as ImageNet-(V2, A, R, C, Sketch).
I love this paper because it’s an excellent example of “freeze the model and iterate on the dataset”.
My next question is, how does the reinforced data paradigm work with an online dataset? As in, a dataset that continually changes and grows. My guess is the original dataset could be cached and then new samples could be reinforced and continually added to the system.
RO-ViT or Region-aware pretraining for open-vocabulary object detection with vision transformers uses random crops of an image to teach a detector to understand what “regions” are and then uses these along with an image and text encoder to create an open-world vocabulary detector.
This means the model is able to detect and localise classes in an image it has never seen before.
The model achieves +7.8 points AP (average precision) on the LVIS dataset rare class split.
A nice side effect of the improved region-aware embedding is that the model also see an improvement on image-level representation, achieving state-of-the-art results on 9 out of 12 metrics for image-text retrieval.
AVIS (Autonomous Visual Information Seeking with Large Language Models) is a framework to combine LLMs with computer vision tools, web search tools and image search tools to answer complicated questions about items in images.
CHITA and CHITA++ (Fast as CHITA: Neural Network Pruning with Combinatorial Optimization) is an optimizable pruning procedure to create smaller whilst still performant neural networks. The results show that CHITA and CHITA++ can prune up to 70% of a model’s weights and still retain good performance.
One caveat: since CHITA performs unstructured pruning (any weight can be removed), resulting networks require software and hardware capable of supporting sparse computations.
Ever wonder how you can say “Hey Siri” or “Siri” (in upcoming iOS 17 and macOS 18) and have your Apple device(s) start listening for commands?
Turns out there are a fair few steps that go into making sure Siri triggers when you want and doesn’t trigger when you don’t want.
Namely:
The workflow for triggering Siri on-device all the way to performing some kind of action. Source: Apple Machine Learning blog.
The above steps would be an excellent machine learning project to try and replicate.
You could:
If you're looking for more machine learning project ideas, check out this list I made.
Eugene Yan and Chip Huyen are two of my favourite writers in the machine learning and AI space.
Their recent blog posts (one from Chip, two from Eugene) delve into the challenges LLMs face as well as the problems and patterns LLMs can be used for.
I’ve said it before but every so often you stumble upon someone’s work and proceed to read everything they’ve ever written.
Vicki Boykis is one of those people.
She’s been working with data and machine learning for over 10 years at companies big and small.
I’d recommend exploring her blog for whatever sparks your interest but the following are what I’ve really enjoyed:
Rapid fire time!
What a massive month for the ML world in August!
As always, let me know if there's anything you think should be included in a future post.
In the meantime, keep learning, keep creating, keep dancing.
See you next month,
Daniel
By the way, I'm also an instructor with Zero To Mastery Academy teaching people Machine Learning in the most efficient way possible. You can see a few of our courses below or check out all Zero To Mastery courses.