57th issue! If you missed them, you can read the previous issues of my monthly A.I. & Machine Learning newsletter here.
Hey there, Daniel here.
I’m an A.I. & Machine Learning Engineer who also teaches the following beginner-friendly machine learning courses:
I also write regularly about machine learning on my own blog as well as make videos on the topic on YouTube.
Since there's a lot going on, the utmost care has been taken to keep things to the point.
Enough about me! You're here for this month's A.I. & Machine Learning Monthly Newsletter.
Typically a 500ish (+/-1,000ish, usually +) word post detailing some of the most interesting things on machine learning I've found in the last month.
Text classification is one of the most common business problems out there.
As a machine learning engineer, I’ve built text classification models for other companies and I’ve created them for my own company (Nutrify).
In this project, we’ll code a custom text classification model using Hugging Face Datasets, Hugging Face Transformers and Hugging Face Spaces.
We’ll follow the motto of “data → model → demo”.
Meaning by the end of the project, you’ll have your own shareable demo live on Hugging Face.
The project we’ll work through together is Food Not Food, a text classification model which predicts whether a sentence is about food or not.
We’ll start with a dataset, build a model to fit the dataset and finally, demo the model as an interactive demo anyone can use in their browser.
Along the way we’ll also learn a bunch about the Hugging Face Ecosystem (one of the most important platforms in the modern AI era).
If you’re in or wanting to learn more, check out the following:
P.S. Don’t forget you can always ask me questions on the ZTM Discord. My handle is “@mrdbourke”.
AI and LLM-powered tools such as GitHub Copilot can be incredibly helpful programming assistants.
I’ve used Copilot extensively but have since taken a break from it because I found myself relying on it too much.
Yes, it was incredibly helpful for simple things like remembering the matplotlib API (e.g. creating a subplot of 10 images with different indexes).
However, there were several times I got too confident with it and let it write a little too much code I ended up having to debug.
In the end it may have been quicker for me to just write it thoughtfully myself.
Copilot tools help me write more code but often what I’m looking to do is to write more thoughtful code.
A quote from the article:
AI assistants, while useful, often allow developers to bypass these steps. For instance, rather than deeply understanding the underlying structure of algorithms or learning how to write efficient loops and recursion, programmers can now just accept auto-generated code snippets.
One of the most significant risks of relying on tools like Copilot is the gradual erosion of fundamental programming skills. In the past, learning to code involved hands-on problem-solving, debugging, and a deep understanding of how code works at various levels—from algorithms to low-level implementation details.
There are many more good arguments in the article too.
Like the one about learning opportunities arising when you have to search for the answer rather than it being presented to you.
A new way of performing document retrieval emerged a couple of months ago.
And that’s ColPali (first mentioned in ML/AI Monthly July 2024).
The idea of ColPali is: embed the page of the document (e.g. a PDF), text, figures, images and all and then match query embeddings to the embedded page.
This is a much less complicated system than using OCR (Optical Character Recognition) to recognize text and then using layout models to recognize the layout and then chunking the text into small pieces to embed.
I made a tutorial on RAG (Retrieval Augmented Generation) that uses the text chunking method above.
However, that method only works if the text you get is already formatted well.
Instead, ColPali (and now ColQwen2) enables you to embed a series of pages in a document, ask a query and turn it into an embedding and then compare the query embedding to the page embeddings and return the best page matches.
You can then use a VLM (vision-language model) to answer questions about the relevant pages.
An overview how ColPali/ColQwen2 work in practice. Source: Daniel van Strien blog.
If your PDFs and documents are rich in visual information as well as text, I’d recommend trying ColPali/ColQwen2 to power your RAG system.
For more on these workflows, check out the following:
Lucas Beyer is one of my favourite computer vision (and now VLM) researchers.
I’d highly recommend checking out his Twitter/X for papers + other ML tidbits.
His latest post shares an insight I didn’t think about.
Pixels have an area.
And if you’re not aware of this, it may throw your evaluation metrics off (e.g. imagine an off by 1 error across thousands/millions of pixels in a large image dataset).
The team at .txt (pronounced “dot tee ex tee”) show how to fine-tune a small language model (Mistral/Phi-3) to beat GPT-4 on function calling (a technique where a language model learns to call a coded function to perform some kind of action).
And if you’d like to learn more about generating structured outputs (e.g. JSON) with LLMs, I’d recommend checking out the rest of their blog posts.
Hugging Face is the place to be for open-source models, datasets and demos.
But there are many more tools the platform offers such as Webhooks (do something when something else is changed), ZeroGPU (free GPU compute for demos), multi-process Docker (run two different applications simultaneously) and more.
In a recent blog post, Derek Thomas walks you through how to create workflow to pull data from a source, process it and store it as a processed dataset all using free Hugging Face tools.
A workflow for creating a continually updated and processed dataset on Hugging Face through the power of Webhooks, ZeroGPU and Docker by Derek Thomas.
For those who are new to the field of AI, it may seem as though LLMs (Large Language Models) are all there is.
It is true that LLMs are very useful but there are many more different kinds of AI models used for a wide range of problems.
To learn more about the different use cases of AI as well as the different kinds of models to use for each, check out Christopher Tao’s article Do Not Use LLMs or Generative AI For These Use Cases.
It’s one thing to use an LLM in existing product such as ChatGPT but it’s another thing to evaluate your own LLM before publishing and sharing it with others.
Clémentime Fourrier writes for the Hugging Face blog about several ways LLM evaluation gets done, such as:
The article also discusses the why behind evaluation, such as, making sure your training is performing well, seeing which model performs best (leaderboards/comparisons) and overall where the field is going (can a model do X yet?).
Quick breakdown: rather than just embedding a chunk of text, get a model to generate the context related to that text and embed that as well.
That way, when you have a query, you can retrieve not only the relevant chunk but also the context around it.
Stacking this method alongside a handful of other tricks led to significant improvement across several retrieval experiments.
Techniques included:
Workflow for contextual retrieval with reranking for the best results. Source: Anthropic blog.
The world of open-source VLMs continues to flourish!
markdownify
on GitHub for doing the same thing in a manual way.When you’re surprised at the outputs of an LLM or VLM or image generator, just remember that the internet runs incredibly deep.
As in, no matter how weird you think the output of a generative model may be, chances are, there’s something like it in its training set.
For example, there’s an image of a dog in a microwave (thankfully it doesn’t look real) as sample 564969 in the very commonly used COCO dataset (1,405 research papers have cited this dataset in 2024 as of September).
Not the strangest example on the internet for sure. But even common academic benchmarks have things you may not have ever thought of.
What a massive month for the ML world in September!
As always, let me know if there's anything you think should be included in a future post.
In the meantime, keep learning, keep creating, keep dancing.
See you next month,
Daniel
By the way, I'm also an instructor with Zero To Mastery Academy teaching people Machine Learning & AI in the most efficient way possible. You can see a few of our courses below or check out all Zero To Mastery courses.