51st issue! If you missed them, you can read the previous issues of my monthly A.I. & Machine Learning newsletter here.
Hey there, Daniel here.
I’m an A.I. & Machine Learning Engineer who also teaches the following beginner-friendly machine learning courses:
I also write regularly about A.I. and machine learning on my own blog as well as make videos on the topic on YouTube.
Since there's a lot going on, the utmost care has been taken to keep things to the point.
Enough about me! You're here for this month's A.I. & Machine Learning Monthly Newsletter.
Typically a 500ish (+/-1,000ish, usually +) word post detailing some of the most interesting things on machine learning I've found in the last month.
Two video tutorials from me this month:
Code your very own RAG pipeline to improve the generations of LLMs line-by-line.
In the video we build NutriChat, a RAG-based system to chat with a 1200 page nutrition textbook PDF using Google’s recent Gemma LLM. And the entire pipeline runs locally (or in Google Colab).
Get the code on GitHub.
NVIDIA recently released an app called Chat with RTX. It’s designed to be ChatGPT-like app that works entirely on your local NVIDIA GPU. You can even upload your own documents and chat with the model and have it reference those documents.
Apple recently updated the Podcasts app with transcriptions. You can now listen to podcast episodes alongside readable transcriptions.
They also released a post on their machine learning blog about the workflow of creating a dataset to measure the performance of the transcription models.
Instead of using WER (word error rate), they used HEWER (human evaluation word error rate). The former takes into account every error (including “ah”, “umm” etc) whereas the latter focuses on errors which make the transcript unreadable or have a different meaning.
Often transcripts will have a seemingly large WER but a lower HEWER (the script is still readable).
The blog shows a sensational example of where a common academic evaluation benchmark (WER) may not be entirely suited to an actual production deployment.
Left: Transcript of Apple Podcast episode from the Hungry podcast. Right: Apple’s custom evaluation metric for making sure transcripts are still human-readable. Source for right image: Apple’s Machine Learning Blog.
Croissant combines metadata, resources file descriptions, data structure and default ML semantics into a single file. It works with existing ML datasets to make them easier to find, use and support with various tools.
If you’re creating an open-source dataset for ML (or even a closed source dataset), you should consider pairing it with the Croissant framework.
It’s backed and built by engineers from Google, Meta, Hugging Face, DeepMind and more.
And it can be used directly with TensorFlow Datasets (TFDS) to help load data into frameworks like TensorFlow, PyTorch and JAX.
See the code on GitHub and TensorFlow Datasets documentation.
import tensorflow_datasets as tfds
builder = tfds.dataset_builders.CroissantBuilder(
jsonld="https://raw.githubusercontent.com/mlcommons/croissant/main/datasets/0.8/huggingface-mnist/metadata.json",
file_format='array_record',
)
builder.download_and_prepare()
ds = builder.as_data_source()
print(ds['default'][0])
Trained on millions of files using Keras, Magika is a fast and lightweight (1MB) model that can detect file types in milliseconds.
A similar model to Magika is used in Google to detect different file types and route them appropriately in Gmail, Drive and more. Every week it looks at hundreds of billions of files to make sure they’re the correct kind.
Magika has got remarkable performance with an average of 99%+ precision and recall across 120+ file types.
It is available under the Apache 2.0 licence and is easy to get started with.
Get the code on GitHub and see the live demo (works in the browser).
!pip install magika
from pathlib import Path
from magika import Magika
# Create a txt file
text_file_path = "test_file.txt"
with open(text_file_path, "w") as f:
f.write("Machine Learning Monthly is epic!")
# Classify the filetype with machine learning
m = Magika()
res = m.identify_path(Path(text_file_path))
print(res.output.ct_label)
>>> txt
Distillation is the process of taking the predictive power of a larger model and using it to improve a smaller model.
For example, take the predictive probabilities outputs of a larger model on a given image dataset and get a smaller model to try and replicate those predictive probabilities.
This process can help to get a good balance between model performance and size.
The paper above is a few years old (2021) but showcases how to improve the results of a smaller network using distillation and a relatively small dataset of 4M unlabelled images.
It shows how MobileNetV3-large can improve from 75.2% to 79% on ImageNet and ResNet50-D from 79.1% to 83%.
The same process is used to train Next-ViT models which offer excellent performance to latency ratios on smaller devices and are often fine-tuned as base models in Kaggle competitions.
Framework for distilling the predictions of a larger teacher model into a smaller student model. The student is trained to mimic the predictive probability outputs of a teacher model. Source: https://arxiv.org/abs/2103.05959
Hamel is a seasoned ML Engineer and he has some of the best production-focused LLM materials on the internet. Three blog posts I’ve been reading:
What a custom 7x4090 system looks like. Source: Nathan Odle blog.
Embeddings are usually in float32.
Which means they use 4 bytes per dimension.
For example, if you have an embedding with 1024 dimensions, it’s going to take up 4096 bytes.
This is not much for a single embedding but scale it up to 250M embeddings and you’re going to need 1TB of memory.
However, this can be remedied with binary and scalar quantization.
Binary quantization turns embeddings into 0 or 1 values, dramatically reducing their storage requirements by 32x (compared to float32).
Scalar quantization bins embedding values between a certain range of int8 values and reduces storage requirements by 4x (compared to float32).
Each of these improves also improves the speed at which they can be queried with a minor loss of performance.
Summary of results of quantizing embeddings in regards to speed, storage and performance. Source: Hugging Face blog.
Bonus: See the demo of the power of quantized embeddings by searching over 41M Wikipedia embeddings in milliseconds.
Powered by the Mixed Bread AI, a powerful new and open-source collection of embedding and reranking models.
TacticAI turns players into numerical representations and predicts their best positioning for corner kicks. Expert coaches found TacticAI placements to be indistinguishable from actual placements and often preferred the placements compared to their own.
Example of creating a 3D model of a peach from a single image in seconds. Source: TripoSR on Hugging Face.
What a massive month for the ML world in March!
As always, let me know if there's anything you think should be included in a future post.
In the meantime, keep learning, keep creating, keep dancing.
See you next month,
Daniel
By the way, I'm also an instructor with Zero To Mastery Academy teaching people Machine Learning & AI in the most efficient way possible. You can see a few of our courses below or check out all Zero To Mastery courses.