46th issue! If you missed them, you can read the previous issues of my monthly A.I. & Machine Learning newsletter here.
Hey there, Daniel here.
Iβm an A.I. & Machine Learning Engineer who also teaches the following beginner-friendly machine learning courses:
I also write regularly about A.I. and machine learning on my own blog as well as make videos on the topic on YouTube.
Since there's a lot going on, the utmost care has been taken to keep things to the point.
Enough about me! You're here for this month's A.I. & Machine Learning Monthly Newsletter.
Typically a 500ish (+/-1,000ish, usually +) word post detailing some of the most interesting things on machine learning I've found in the last month.
RAG Resources Repo β The theme of this month's issue is RAG (or Retrieval Augmented Generation).
It's a technique I've been using to improve the outputs of my LLMs. And with that, I've created a GitHub repo to collect all of the best resources I've found so far on learning the technique.
What is RAG?
Retrieval Augmented Generation or RAG for short is the process of having a Large Language Model (LLM) generate text based on a given context.
The goal of RAG is to reduce hallucinations in LLMs (as in, prevent them from making up information that looks right but isn't).
Think of RAG as a tool to improve your calculator for words.
The workflow looks like this:
What can you use RAG for?
Letβs say you were a large company with huge amounts of documentation.
You could build a traditional search engine on top of that documentation.
And this works fairly well.
But sometimes you want to go directly to the answer.
Thatβs where RAG can help.
A RAG workflow would involve someone asking a question (typically called a query), then retrieving passages in the documentation related to that query and having a LLM generate an answer based on the relevant passages.
That way, instead of just generating an answer based on the LLMs training data, the generated answer comes from a combination of your companyβs own data.
The following resources will help you learn more about the topic of RAG.
But stay tuned to the rag-resources GitHub repo for future updates.
RAG-Fusion is a take on RAG that enhances the original search query by creating other similar queries like it, searching for those and then combining the results.
This is similar to the process of βHypothetical Document Embedding" (HyDE, mentioned in ML Monthly May 2023), as in asking a model βwhat would a document that answers this question look like?β, generating that document and then searching for similar documents to it.
Diagram of RAG Fusion workflow. Source: Adrian Raudaschl blog.
Claude is a Large Language Model (LLM) by Anthropic with similar performance to GPT-4. It's one of the best available proprietary LLMs.
In this blog post, the Anthropic team go through a series of experiments to improve Claude's performance (e.g. showing no examples, two examples and five examples). Turns out, with smaller models (Claude 1.2), the examples really matter whereas with larger models (Claude 2), examples still improve results but aren't as necessary.
Bonus: Anthropic also have an LLM cookbook on GitHub with examples such as how to iteratively search Wikipedia for information to help answer a question.
Results of using Claude LLM with varying levels of examples and scratchpad abilities. Source: Anthropic blog.
A nice and easy to read overview of the RAG landscape along with workflow and tool recommendations such as the Emerging Architectures for LLM Applications by Andreessen Horowtiz.
One of the gold-standard posts for how to build a full production setting RAG application RAG for production blog post has been updated with new sections:
I highly recommend reading through this post to see what it's like to build a full-scale RAG-style application.
Overview of a typical RAG workflow to get relevant responses from a query into an LLM to create a response. Source: Anyscale blog.
Researchers from Meta have successfully used signals from magnetoencephalography (MEG, a technique which measures brain activity with magnets) aligned with image encoder representations to generate images given brain waves.
Instead of using a text-to-image prompt, this is brain wave-to-image. Looks like the world of brain-computer interfaces over the next couple of years is going to be wild.
Movie showing the image shown to a person and then the output from an image generation model conditioned on brainwaves. Source: Meta AI blog.
The best way to fine-tune or use LLMs on custom data is still being figured out.
Jeremy Howard from fast.ai discovered a weird effect when trying to fine-tune an LLM on science exam questions... the loss would drop dramatically even with only one custom training sample.
This may mean that the majority of LLM knowledge is learned during pretraining and that the notion of fine-tuning may need to be completely rethought.
Bonus: Jeremy recently went on the Latent Space podcast to talk about his findings.
How do you monitor thousands of cattle on a large and growing dairy farm?
Ideally you want the cows to be as healthy as possible for:
Current solutions involve days of work from multiple people counting and inspecting individual cows.
However, potentially computer vision can help. Using commodity hardware (cheap security cameras) as well as off-the-shelf models (YOLOv5), a team from AWS built a proof of concept to monitor cow health with computer vision.
The model measures a degree of βlameness" (when a cow is sick, it tends to bow its head and walk with smaller strides). Initial results show a promising potential to expand the technology for similar use cases.
How do you organize billions of files? It turns out dates are important.
I use dates in my own filenames. For many projects, I name the file with the date in reverse order (YYYY-MM-DD). I find it helpful to sort files in their chronological order.
Using a Transformer model and a custom labeled dataset, Dropbox created a model to help rename files to a consistent date format. Since releasing the model in August 2022, they've seen a 40% increase in renamed files.
The blog post is an excellent read about how to build seemingly small feature but used by millions.
Many of the largest startups and companies of the past 20 years have been from unbundling Craigslist. As in, taking an existing category from the Craigslist website and turning it into a company.
The reason?
Because Craigslist covered everything but it wasn't really anything.
Benedict Evans argues a similar case for AI and in particular ChatGPT. Because ChatGPT covers everything, does this now mean there's room for more specialized companies to be built for AI specific use cases?
LLMs work with natural language data. But behind the scenes, the computer still needs a way to convert that data into numbers. That's what a tokenizer does. It converts raw text into numerical form so that it can be used with a machine learning model.
In this blog post, Alex Strick van Linschoten goes through several different levels of tokenization methods and why they're required for machine learning systems.
As a bonus, Alex created a follow-up post with several examples of different tokenizers from FastAI, Hugging Face and more.
Different levels of tokenization (turning words into numbers). Source: Alex Strick van Linschoten blog.
What happens when you give an LLM access to data sources other than language (e.g. audio, images, tables + more)?
It becomes a Large Multimodal Modal (LMM)!
This is the kind of model that GPT-4V is. Because it can deal with text and images, it's considered multimodal. Multimodal stands for βmulti-modalities" as in, multiple sources of data.
In this comprehensive post, Chip Huyen goes through the why of multimodality (life is multimodal, not just text), how to train multimodal models (mixing text and other sources) and research directions for LMMs (figuring the best way to incorporate different sources of data).
A fascinating essay from 2012 (I originally read it as if it was written last week but was shocked to realize it's over 10 years old) about whether Artificial General Intelligence (AGI) is a computer science, physics or philosophy problem.
Deutsch argues that humansβ unique capability over AI is our ability to generate new explanations. And current AI systems are only capable of generating explanations contained within their training set.
As a follow on to this essay, a more recent episode with Deutsch speaking about AGI is available on the Reason is Fun podcast.
Example input and output of Fuyu multimodal model from Adept AI. The model takes in an image and is able to read the text in the image despite never being explicitly trained to do so. Source: Adept AI blog.
Example of OwLv2 outputs with the prompt [βfriesβ, βcheeseburgerβ, βtomato sauceβ].
The model is able to successfully identify almost all instances of the given classes. This is a game changer for automatic labelling!
Source: Thatβs a photo of my lunch the other day π.
βWe observe that data quality is much more important than quantity (different from existing open source efforts or ALIGN that mostly scale quantity)β.
Get the code on GitHub.
What a massive month for the ML world in October!
As always, let me know if there's anything you think should be included in a future post.
In the meantime, keep learning, keep creating, keep dancing.
See you next month,
Daniel
By the way, I'm also an instructor with Zero To Mastery Academy teaching people Machine Learning & AI in the most efficient way possible. You can see a few of our courses below or check out all Zero To Mastery courses.