[August 2025] AI & Machine Learning Monthly Newsletter

Daniel Bourke
Daniel Bourke
hero image

68th issue! If you missed them, you can read the previous issues of my monthly A.I. & Machine Learning newsletter here.

Hey there, Daniel here.

I’m an A.I. & Machine Learning Engineer who also teaches the following beginner-friendly machine learning courses:

I also write regularly about machine learning on my own blog as well as make videos on the topic on YouTube.

Since there's a lot going on, the utmost care has been taken to keep things to the point.

Enough about me! You're here for this month's A.I. & Machine Learning Monthly Newsletter.

Typically a 500ish (+/-1,000ish, usually +) word post detailing some of the most interesting things on machine learning I've found in the last month.

Here's what you might have missed in August 2025 as an A.I. & Machine Learning Engineer... let's get you caught up!

My work

ZTM object detection project with Hugging Face Transformers

The code and tutorial are done! And the videos are being edited! Stay tuned for them to go live on ZTM soon.

In progress: ZTM LLM fine-tuning with Hugging Face Transformers

I’ve begun work on the next tutorial/project for ZTM. Inside we’ll fine-tune a small LLM to do a specific task. That way you’ll be able to run it on your own hardware without the need to call to APIs.

From the Internet

A few vibe coding related pieces to begin.

Many of them biased by my own conception: vibe code when it doesn’t matter or you need a quick demo, write it when it does.

Vibe code is legacy code by Steve Krouse

We already have a phrase for code that nobody understands: legacy code.

Bonus talk to go along with the article, The Role of the Human Brain in Programming.

Writing code was never the bottleneck by Pedro Tavares

An eloquent post discussing how more code was never really the problem.

writing-code-was-never-the-bottleneck

Most code is thinking, reviewing and designing. All of those steps take time. And if they’re done right, writing code becomes the easy part. Source: ordep.dev blog.

Understanding what the problem was and how to best solve it (a moving target) is the problem.

Most code is thinking, reviewing and designing. All of those steps take time. And if they’re done right, writing code becomes the easy part. Source: ordep.dev blog.

Most code is thinking, reviewing and designing. All of those steps take time. And if they’re done right, writing code becomes the easy part. Source: ordep.dev blog.

Building AI Products In The Probabilistic Era by Gian Segato

Working with ML models in production for a while now I liked this line:

The goal isn't perfection: by definition you can't nor should aim for it. The goal is to manage the uncertainty.

How do you build a product where the outputs are probabilistic?

Tip: You try as best you can to tip the probability in your favor.

One of the levers to do so in machine learning is data.

Or in LLM world, it’s having good context (more on that soon).

I also liked this passage about how an improvement in underlying model capabilities can change your whole product:

When Replit moved from Sonnet 3.5 to 3.7, Replit’s President Michele had the company rewrite theĀ entire productĀ in less than 3 weeks. We called it Replit v2, but that was quite an understatement. In reality, it was a brand new product.

The architecture, data model, prompting technique, context management, streaming design… it was all new. 3.7 was highly agentic in an entirely novel way, Michele understood it, and decided to lean into it instead of trying to control it.

But of course…

Just because a new model comes out doesn’t mean you have to change to it.

Nor should you expect such large changes each time:

Every model update doesn’t necessarily mean a complete rewrite every time, but it does force you to fundamentally rethink your assumptions each time, making a rewrite a perfectly plausible hypothesis.

You have to follow an empirical method, where the only valid assumption is ā€œI don’t knowā€. Being an empiricist first is diametrically opposed to being an engineer first.

The motto I say in all of my courses rings true: experiment, experiment, experiment!

Why ā€œContext Engineeringā€ Matters by Drew Breunig

Context Engineering is a new term arising to add a bit more finesse to Prompt Engineering.

Where prompts can be throw away things, context for an LLM is more of a carefully engineered substance.

Call it semantics.

But I like it.

Prompt to explore.

Context to get repeatable (as possible) workflows.

Drew’s series of blog posts (every so often you stumble upon someone’s blog and devour all of their recent posts, I live for these moments) had me enthralled this week.

I like his definition of context engineering:

Context engineering = systematically engineer contexts in pursuit of an outcome

And this table breaks it down well:

prompts v contexts

Comparing prompts versus contexts. Source: dbreunig.com

For more on this, I’d recommend the following of Drew’s posts:

Tokens are getting more expensive by Ethan Ding

Models are getting cheaper.

But they’re getting used more.

Wayyyyyyy more.

A typical query to an LLM used to be:

  • Input tokens
  • Output tokens

Simple right?

Now there’s an unknown middle:

  • Input tokens
  • [Insert reasoning trace tokens here]
  • Output tokens

Ethan writes an excellent article stating how because the models are now getting better and better, they’re getting used more and more.

And the fixed pricing of $20/month isn’t going to be enough to cover large-scale consumer usage.

If you charge per million tokens in an API, you’re fine.

Price scales with usage.

But flat-fee pricing can only go so far…

I like this point on always just using the best model:

using-the-best-model

When a new model drops, if it’s significantly better, you don’t hold onto using the previous one. You use the new one. Source: Ethan Ding Substack.

Another takeaway I liked were optimizations Anthropic tried to do on their Claude Code product to save on cost/token usage breakdown:

  1. Charge 10x the price ($200 instead of $20)
  2. Auto-scale models based on load (use Sonnet at $15/M tokens instead of Opus at $75/M tokens)
  3. Offload preprocessing to user machines

Even with these, token usage still went off the charts (when you have no limits, people will figure this out), see viberank.app for an active Claude Code token usage leaderboard.

Are these people paying for the tokens?

Or have they found out that even at $200/month Claude Code is a bargain?

Who knows.

Mistral show how fine-tuning a vision language model can improve satellite imagery recognition

Mistral show how they took Pixtral-12B, an open-source vision language model from 56% accuracy using prompting to 91% accuracy on 30 class satellite imagery classification using fine-tuning (see chart below).

The model started out with okay results on 30 classes.

But fine-tuning on 8000 training images really stepped things up a notch.

We see similar effects at Nutrify.

Off the shelf models start out with okay performance but dramatically improve with fine-tuning.

Note: The authors suggest that similar results could be achieved with a smaller, specialized vision model (I agree with this) but VLMs also provide the opportunity for more nuanced use cases such as talking with images compared to traditional classification models.

See the Mistral cookbook for code on how to fine-tune Pixtral-12B.

classification-improvements-on-fine-tuning

Accuracy comparison of Pixtral-12B using prompting for classification versus using fine-tuning. Source: Mistral blog.

Strategy Letter V by Joel Spolsky

Written in 2002 but still rings true today.

Joel compares the macroeconomics and microeconomics of software.

With the key takeaway being: Smart companies try to commoditize their products’ complements.

Relevant to today’s world of AI when comparing open-source to proprietary models.

OpenAI drives most of their revenue from consumer-based subscriptions.

Where as Anthropic gets most of their revenue from API usage.

So what did OpenAI do?

They open-sourced two powerful models gpt-oss-120b and gpt-oss-20b (more on these below), not as good as Claude 4 but quite good.

And they lowered the prices of GPT-5 in the API.

In essence, OpenAI don’t need to make money on the API since they make so much from the consumer app ChatGPT.

On the other hand, Anthropic relies on revenue from their API.

So by open-sourcing high quality models and lowering the price of their API for GPT-5, OpenAI is effectively commoditizing their products’ complements.

How Xet storage layer improves uploads and downloads on Hugging Face

When you start working with large datasets, uploading and downloading becomes quite a bottleneck.

For example, the image dataset I work with on Nutrify is about 100GB.

Not the largest.

But not the smallest either.

Hugging Face stores 21 Petabytes of data.

So uploads and downloads matter even more to them.

Good news is with the new Parquet Content-Defined Chunking (CDC) available in PyArrow and Pandas, you can now perform efficient data operations on Hugging Face storage repos.

For example, instead of uploading and downloading a whole dataset each time, you can just work with the changes.

Say I make changes to 100 rows out of 100,000, I can upload and download the 100 rows instead of the full 100,000!

It all starts with the new use_content_defined_chunking parameter:

import pandas as pd
import pyarrow.parquet as pq

df = ... 
df.to_parquet("hf://datasets/{user}/{repo}/path.parquet", 
							use_content_defined_chunking=True)

table = ...
pq.write_table(table, 
               "hf://datasets/{user}/{repo}/path.parquet", 
							 use_content_defined_chunking=True)

See more use cases in the Hugging Face Parquet Content-Defined Chunking guide as well as the article *From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub.*

Daniel’s Open-source AI of the Month

1. AIDC-AI release Ovis2.5

The Ovis (Open Vision) series are some of my favourite VLM models.

And the 2.5 upgrade introduces the following:

  • Ovis2.5-2B
  • Ovis2.5-9B

Both models use SigLIP-2 as the vision encoder.

And Qwen3-1.7B and Qwen3-8B as the language models.

The 9B version beats out GPT-4o on several benchmarks.

And the 2B version is best in class for its size.

One of my favourite new use cases is the grounding capabilities.

For example, the models are able to extract tables from images as well as detect objects in images based on a prompt.

ovis2.5-grounding-examples

Ovis2.5 models can extract text and detect objects based on prompt inputs. For example, you can instruct the model to ā€œextract bounding boxes for all unique foods in the imageā€. Source: Ovis2.5 paper and custom image.

The models are currently available in Hugging Face Transformers and will be in vLLM soon.

2. Google DeepMind releases Perch 2.0

Perch 2.0 is a bio-acoustics foundation model.

Able to create high quality embeddings of animal and environment sounds from the wild and classify them into 14,795 classes.

The model uses an EfficientNetB3 backbone and is trained on a large corpus of 1.5M soundbites from public sources.

Researchers can use the Perch 2.0 model backbone to generate embeddings on their own custom data and quickly build classifiers with a small number of examples.

Or you could take an existing recording, say of a bird singing, embed it with Perch 2.0 and then find similar sounds in a dataset.

See the demo video on YouTube, read the paper, download the model from Kaggle.

3. Meta releases DINOv3 self-supervised computer vision backbones

DINOv3 is a self-supervised computer vision model trained on 1.7B (Meta call this dataset LVD-1689M) curated public images from Instagram.

Using such a large dataset means the model learns incredibly high quality visual features.

dinov3-overview

DINOv3 short overview. Image data is from public images on Instagram, results show performance on classification tasks and the output features on the right show more defined characteristics than previous foundation model backbones. Source: DINOv3 paper.

The models come in two variants, ViT and ConvNeXt with various sizes.

Vision Transformer (ViT) models (distilled from the ViT-7B model):

  • ViT-S/16 distilled: 21M parameters
  • ViT-S+/16 distilled: 29M parameters
  • ViT-B/16 distilled: 86M parameters
  • ViT-L/16 distilled: 300M parameters
  • ViT-H+/16 distilled: 840M parameters
  • ViT-7B/16: 6,716M parameters

ConvNeXt models (for efficient deployment):

  • ConvNeXt Tiny: 29M parameters
  • ConvNeXt Small: 50M parameters
  • ConvNeXt Base: 89M parameters
  • ConvNeXt Large: 198M parameters

DINOv3 backbones can be used as feature extractors and combined with different output heads (e.g. linear layers or clustering models) for various tasks.

For example, you could use DINOv3 models to embed your large image dataset and then use the embedding database for retrieval (pass in a target image and retrieve similar images from the database).

The training code and example use cases are available on GitHub, the models are on Hugging Face and the paper is on arXiv.

4. Google releases Gemma 3 270M, a small language model ready for fine-tuning

Gemma 3 270M is the latest addition to the Gemma series of models.

This is the perfect model for a specific task which can be fine-tuned.

From the Google release notes:

when-to-use-gemma-3-270M

Use cases for the Gemma 3 270M model are focused on high-volume, specific tasks. Because of the smaller model size, it can be run on lightweight systems and even on-device. This means you can experiment faster with task-specific fine-tunes. Source: Google Developers Blog.

The model has a large vocabulary of 256K different tokens. This means it can be used in environments where specific language or style matters.

Prior to quantization (making the model weights smaller) the model has a footprint of 536MB which means it can fit on small devices such as mobile phones.

And after quantization it will likely be 2-3x smaller.

See the following resources for more:

5. OpenAI release gpt-oss-20B and gpt-oss-120B

In the short time I’ve had hands-on with these models, they’re good.

Most of the tasks I do with LLMs is for structured data extraction/creation.

So I can attest to them being good there.

I had the 20B version running on my Mac Mini M4 Pro quite fast (50+ tokens/second) in LM Studio.

gpt-oss-running-locally

Example of running gpt-oss-20b locally on my Mac Mini M4. The model works quite fast and the output quality is great for small tasks (based on what I’ve tried so far). Source: Author created image.

For a good technical breakdown of the models I’d recommend the introductory post on Hugging Face.

It’s where I found the following paragraph:

The model weights are quantized in mxfp4 format, which was originally available on GPUs of the Hopper or Blackwell families, but now works on previous CUDA architectures (including Ada, Ampere, and Tesla).

Installing triton 3.4, together with the kernels library, makes it possible to download optimized mxfp4 kernels on first use, achieving large memory savings. With these components in place, you can run the 20B model on GPUs with 16 GB of RAM. This includes many consumer cards (3090, 4090, 5080) as well as Colab and Kaggle!

The tidbit here is the MXFP4 quantization.

This means even though the model says 20B parameters (usually ~48GB of GPU RAM required), it can be run on GPUs with ~16GB of RAM.

And similar with the 120B model.

Thanks to the quantization, it can also be run on a single 80GB GPU such as a NVIDIA H100.

It also means they run fast!

That’s a big deal!

For a visual breakdown of the architecture, I’d recommend The Illustrated GPT-OSS by Jay Alammar.

The models come with three different levels of reasoning: low, medium (default) and high.

The fastest way to get hands-on with the models is at gpt-oss.com (a small web app running the models through Hugging Face’s Inference Providers service).

Or you can also download them directly from Hugging Face, LM Studio or Ollama.

6. Allen AI release MolmoAct

MolmoAct is a Action Reasoning Model (ARM), a model which combines vision, language, reasoning and action into one.

The model is able to take in visual perceptions and voice (natural language) commands and then use those for planning actions to take in the real world.

molmo-act

Given instructions, the model draws an action trace in 3D space and then a robot follows those lines. Source: MolmoAct blog.

The model weights and datasets are available online, see the demo video for more.

7. Qwen release Qwen-Image and Qwen-Image-Edit

Two highly performant and open-source models.

Qwen-Image generates images with quality comparable to the best in class.

And Qwen-Image-Edit is capable of taking existing images and applying edits whilst retaining the original details.

8. NVIDIA release a suite of updated models

  • Speech recognition and translation: canary-1b-v2 expands the transcription and translation to 25 different languages (English + 24 European languages).
  • Speech recognition: parakeet-tdt-0.6b-v3 expands support for speech transcription to 25 total languages (English + 24 European languages).

Both models provide timestamps, punctuation and capitalization.

  • Language reasoning model: NVIDIA-Nemotron-Nano-9B-v2 provides better performance and up to 6x faster throughout than Qwen3-8B (an equivalent model size) thanks to a hybrid architecture of Mamba and Transformer layers. See the blog post for more.

All models are available for commercial use.

9. Google releases LangExtract, an open-source library for extracting entities from text with LLMs

What if you had a large number of email chains with various stakeholders and discussion topics and you wanted to extract structured data from them?

Or a history of call logs?

Or even a whole novel?

That’s where LangExtract comes in.

You provide instructions, examples of extractions and a model to use and LangExtract handles the rest.

langextract-usage

Example showing LangExtract instructed to extract food and drink entries of a passage of text. Source: Author created.

Get the code for LangExtract as well as an example of working with extremely long text on GitHub and see an example of a LangExtract demo on Hugging Face Spaces.

10. OpenGVLab release InternVL-3.5 family and OpenBMB release MiniCPM-V-4.5

The InternVL-3.5 family is a series of VLMs all achieving close to state-of-the-art on several VLM benchmarks for their relative size.

My favourite is the InternVL3.5-8B model (a model capable of running on a local GPU) performing on par with Claude Sonnet 3.7.

See the InternVL3.5 paper for more.

MiniCPM-V-4.5 is a 8.7B parameter VLM which achieves efficient video inferencing as well as outstanding OCR results, outperforming models such as GPT-4o-latest on OCRBench.

There’s even a demo of the model running locally on an iPad with an M4 chip.

11. Microsoft release VibeVoice-1.5B for long multi-speaker speech generation

Generating multi-speaker (up to 4) audio for up to 90 minutes long is impressive.

I tried it out on a few smaller pieces of audio and it did… okay?

Perhaps I’m not using it right (most likely).

You can prompt the model with turn by turn examples, such as:

Speaker 0: Welcome to Machine Learning monthly!
Speaker 1: This is the video walkthrough of the text-based newsletter that covers the latest and greatest in the world of AI and machine learning.
Speaker 0: But not always the latest...
Speaker 1: It's been a big month so let's see what happened in August 2025.

And then you can choose which voice each speaker gets.

The audio that came out of the example above was about 5/10 for me.

More experimenting required…

Research

  • Google release a paper detailing MLE-STAR, a machine learning engineering agent capable of implementing machine learning tasks and entering Kaggle competitions.
  • Google show how you can achieve sensational results by fine-tuning an LLM with 400x fewer pieces of training data (tip: use ~250 high quality samples instead of 100,000 lower quality samples). I expect much more of this kind of workflow to play out over the next couple of years when smaller, specialized models start playing more of a role (e.g. see Gemma 3 270M above). The most interesting part to me was the curation process of high quality samples.
google-high-quality-sample-curation

To curate a subset of high quality samples, Google researchers first labelled a large set of data with LLMs and then iteratively worked through finding samples which had differing labels but overlapped. Source: Google Research blog.

Releases

  • The Gemini API can now directly ingest URLs in the prompt. Models can extract and interact with data from the URL. Pricing gets charged as if the content of the URL are input tokens.
  • **Google release Gemini 2.5 Flash native image generation and editing.** Turns out this model had an alias of nano-banana before dropping. I like the alias. It makes you go, what could that be? And it’s a bit easier to remember than gemini-2.5-flash-image-preview. Either way, this is the best image generation model I’ve tried. Even better than Imagen 4 (Google’s other image generation model) in a handful of cases I’ve tried. The model is capable of editing as well, not just generation. Sam Witteveen has a great overview of it.
gemini-2.5-image-generation-example

Gemini 2.5 Flash enables you to generate an edit images all in the same interface. It also does well at keeping consistency through time. For example, keep the same subject throughout subsequent edits. Source: Author created via Google AI Studio.

  • **OpenAI released GPT-5 in the API and in the ChatGPT interface.** A few people have said it’s not at good as the hype. But I’ve tried it and it’s good… at least for my use cases. Most critiques have never shipped anything to 500M+ people though. The major release is they cleaned up the models in ChatGPT. Instead of the 25 different options, it’s now GPT-5 or GPT-5-Thinking. And the interface will even select the best model for you based on whether your query is simple or requires a bit more time to think about. The models are also much cheaper in the API now (see above: commoditize your complements). Aside from actually trying out the models myself, one of my favourite pieces on it was by Ethan Mollick, GPT-5: It just does stuff.

Videos

  • [Video] Anthropic put out a good Prompting 101 with Claude video. There are still many tricks to be discovered here as ā€œpromptingā€ turns into ā€œcontext engineeringā€.
  • [Video] Vibe coding is the worst idea of 2025 by Modern Software Engineering. The title is a bit clickbaity but I like the theme here. Just because you can vibe code, doesn’t mean the fundamentals of software engineering aren’t still worth it.

See you next month!

What a massive month for the ML world in August!

As always, let me know if there's anything you think should be included in a future post.

In the meantime, keep learning, keep creating, keep dancing.

See you next month,

Daniel

www.mrdbourke.com | YouTube

By the way, I'm also an instructor with Zero To Mastery Academy teaching people Machine Learning & AI in the most efficient way possible. You can see a few of our courses below or check out all Zero To Mastery courses.

More from Zero To Mastery

The No BS Way To Getting A Machine Learning Job preview
The No BS Way To Getting A Machine Learning Job
19 min read

Looking to get hired in Machine Learning? Our ML expert tells you how. If you follow his 5 steps, we guarantee you'll land a Machine Learning job. No BS.

6-Step Framework To Tackle Machine Learning Projects (Full Pipeline) preview
6-Step Framework To Tackle Machine Learning Projects (Full Pipeline)
30 min read

Want to apply Machine Learning to your business problems but not sure if it will work or where to start? This 6-step guide makes it easy to get started today.

[August 2025] Python Monthly Newsletter šŸ preview
[August 2025] Python Monthly Newsletter šŸ
8 min read

69th issue of Andrei Neagoie's must-read monthly Python Newsletter: Python Performance Myths, Do You Need Classes, Good System Design, and much more. Read the full newsletter to get up-to-date with everything you need to know from last month.