[December 2025] AI & Machine Learning Monthly Newsletter 💻🤖

72nd issue! If you missed them, you can read the previous issues of my monthly A.I. & Machine Learning newsletter here.

Hey everyone!

Daniel here, I’m a machine learning engineer who teaches the following beginner-friendly machine learning courses:

I also write regularly about machine learning on my own blog as well as make videos on the topic on YouTube.

Since there's a lot going on, the utmost care has been taken to keep things to the point.

Here's what you might have missed in December 2025 as an A.I. & Machine Learning Engineer... let's get you caught up!

My work

Happy New Year everyone!

I hope you’re as excited for 2026 as I am.

Despite being full of holidays, the last month of 2025 didn’t change anything for machine learning and AI updates… there’s quite a few!

A few notes on my own work and then we’ll get into them.

My new course is live! The Machine Learning with Hugging Face Bootcamp is live on Zero to Mastery. You can think of Hugging Face as the homepage for modern ML and AI workflows. I personally use Hugging Face every day in both an exploratory and professional sense. This new course aims to rebuild the exact workflows I use for creating custom models and working with clients as well as specific ML and AI projects. Inside we’ll go step by step through several ML projects, ending with sharable demos at the end.
[Video] Ranking the best open-source AI companies and models of the year. Writing these ML Monthly issues means I get to see and interact with quite a few models and open-source AI releases. In this video, I go through my top companies and releases of the 2025.
[Video] Unboxing + setting up the NVIDIA DGX Spark. NVIDIA decided to send me one of their new DGX Sparks (a small AI supercomputer). Stay tuned for more videos exploring how this machine works for local AI development and use.

From the Internet

Thinking Machines makes Tinker available for all an includes vision fine-tuning. You can now train your own custom LLM and VLM models with managed infrastructure using Tinker. Several Qwen3-VL models are available, see the recipes on GitHub for examples.

tinker-example-of-fine-tuning

Example results of using Tinker for fine-tuning a Qwen3-VL model for image classification versus a pure vision-based model DINOv2. Notice how the Qwen3-VL model drastically improves with a small number of samples. Source: Thinking Machines blog.

Apple show how the M5 chip’s neural accelerators speedup LLM generation. Compared to the M4 chip in the MacBook Pro, the M5 chip can achieve up to 4x faster Time to First Token (TTFT) on models such as Qwen3-14B-MLX-4bit. It also achieves an average of 25% faster generation speed thanks to the increased memory bandwidth. I think we’re only starting to scratch the surface of what’s possible running local models. This highlights how much can improve in a single generation of hardware.

apple-m5-mlx-speedup

Examples of how much the M5 chip speeds up Time to First Token (TTFT) in comparison to the M4 chip across various model architectures, often averaging 3-4x improvement. Source: Apple Machine Learning blog.

AI World Shares their 2025 AI Advent Calendar. My favourite was the top downloaded models of the year. Something non-surprising was seeing the Qwen models come out on top for the most number of downloads. According to the website, the Qwen2.5-VL-3B-Instruct model has been downloaded more than 300,000,000 times 😲. Though I’m not sure what counts as a download… is it always a fresh install? If so, that’s a much larger number than I thought.

aiworld-model-downloads-2025

[Essay] Why your boss isn’t worried about AI by Boyd Kane. Thought provoking piece on how AI models sometimes get misinterpreted as regular software and so the potential downsides of AI get ignored because we largely understand the downsides of errors in software. I like the recurring theme throughout of: it largely comes down to what was the data the AI was trained on?
Google Colab comes to VS Code. No local GPU? No problem. You can now connect a Google Colab backend with a cloud-hosted GPU and run your local notebooks on Google Colab directly from VS Code. This means you could write experimentation code locally and then when it comes time to run it on a GPU, you could connect to Google Colab and make it run faster.
Google published a nice review of all their releases in 2025 (there’s quite a few). Sitting behind OpenAI in terms of raw model performance for 3 years (since GPT-4 released), now arguably on top by almost every metric. Google has been on an absolute roll. I’m personally using Gemini far more at the end of 2025 than at the start.
Google DeepMind release WeatherNext 2. Based on a new Functional Generative Network (FGN), the WeatherNext 2 model outperforms the previous generation on 99.9% of measurable metrics including temperature and wind predictions as well as lead times (0-15 days).
Philipp Schmid releases a series of blog posts to help with Gemini 3 and AI Agent Understanding. From Gemini 3 Prompt Best Practices, Why Senior Engineers Struggle to Build AI Agents (a good overview of how sometimes more knowledge can hold you back when learning a new concept), A Practical Guide on Building an Agent from Scratch with Gemini 3 and Context Engineering for AI Agents (on of my favourite takeaways was the point on as AI models get better and better, the need for excessive instructions and harnesses gets removed).
The Hugging Face Tokenizers library gets several upgrades as part of Transformers v5. Tokenization is the process of turning raw data into numbers (e.g. text into numerical form so a machine learning model can work with it). And as part of the transformers v5 release, the tokenizers library (which is built-in to transformers) gets upgrades including a speedy Rust-based tokenizer by default and simpler ways to create and customize your own tokenizers. Bonus: Tokenizer behaviour can be tricky because not all tokenizers are made the same. And if you use the wrong tokenizer for the wrong model, you can get less than ideal results. See the *Gotchas in Tokenizer Behavior* article for more.
Beej’s Guides (by Brian Hall). Every so often you stumble an exceptional blog someone has been toiling away for on for years. And this month’s for me is beej.us. There are some great guides on there I’d suggest checking out if you like to read HTML-style tech blogs and straightforward posts (one of my favourite styles). I’ve been reading the one on Git and plan on reading the one on Computer Science next. There’s also an upcoming one on Python programming.
[Essay] The Future of Software Development is Software Developers by Jason Gorman. Jason Gorman has been a computer programmer for 43 years, seeing trends coming and going. He writes that LLMs are no different. And argues that not only will LLMs not necessarily replace human programmers, they will demand more. Jevon’s paradox in full swing. I’m seeing the same on the ground in my own work, now that more prototypes can be created, the urge to create them is even higher. But as always, a prototype is not a finished product. So in the end, your job is to still deliver code you have proven to work.
[Guide] How to speedup open-source LLMs with EAGLE-3 by Lmsys.org. EAGLE-3 (Extrapolative Attention Guided LEarning) is a speculative decoding technique that speeds up LLM decoding speed by 2-3x by adding a small ‘draft head’ layer to the model which is around 2-5% of the model’s total parameters. The draft head generates candidate tokens (fast) and the large model selects the best (or it generates new tokens if the draft tokens do not match the expected outputs).

target model eagle3

Example of EAGLE 3 in use. The original model is fitted with an EAGLE 3 head which is used for generating a tree of draft output tokens which can be selected by the larger model for final outputs. Source: Lmsys blog.

Daniel’s Open-Source AI of the Month

Meta release OmniLingual, an open-source Automatic Speech Recognition model capable of performing speech recognition for 1,600+ languages which outperforms Whisper v3. There are several model sizes available ranging from 300M parameters to 9B.
Ai2 release Molmo2, a series of open-source VLMs (Vision Language Models) with a focus on openness, multi-image and video capabilities. For example, the model is highly capable of pointing to objects in images based on a text prompt such as “point to all the seafood items”. It can also track objects and items in videos over a series of frames. The release not only comes with open model weights, all of the data is available as well.

molmo2-pointing-capability

Example of Molmo2’s pointing capabilities. Given an image an instructions on what to point at, it’s capable of returning text-based point coordinates which can be converted to image-based point coordinates and shown on a plot. Source: Allen AI playground with Molmo2 8B model.

EssentialAI release rnj-1 , a 8B LLM on par with models such as Qwen3-8B. Interesting that the model was trained with JAX on TPUs and AMD GPUs.
Meta release SAM Audio, a segmentation model which has been designed to separate audio from visual feeds. For example, if there is a video of a person walking along a street talking on the phone and a dog is barking in the background, you can prompt the model to select the “person” and the separate the audio tracks to highlight the person talking an ignore the background noise.
ServiceNow-AI releases Apriel-1.6-15b-Thinker, an open-source VLM which is on par or better than Gemini 2.5 Flash as well as Claude Haiku 4.5 on the Artificial Analysis Index (it scores 57 versus 54 for Gemini 2.5 Flash). Read the release blog post for more details.
Z.ai releases GLM-4.6V-Flash, GLM-4.6V and GLM-4.7 (current best open-source coding model). The two V models enable vision inputs and are very competitive with other models at their size. The GLM-4.7 model brings an incredible leap forward for open-source models in coding capabilities. It performs on par with Claude Sonnet 4.5 as well as GPT 5.1 (high) on several software-related benchmarks.
Meta releases Segment Anything 3 (SAM 3), a model capable of segmenting and detecting objects in images based on text and image prompt inputs. Bonus: See EfficientSAM3 for an efficient implementation with a 99% vision encoder and 6x smaller text encoder.
CLaRA is an LLM from Apple which combines retrieval and generation into a single model. For example, it dually trains the model to learn to compress queries and documents so when it comes time for RAG (Retrieval Augmented Generation) the retrieval step has already been built in.
The Qwen team releases a series of image model updates. Qwen-Image-Layered (turn an image into layers) takes an existing layered image and breaks it into individual layers so they can be altered or edited individually. Qwen-Image-Edit-2511 (edit an image based on a text prompt) improves the consistency of image editing over the previous 2509 version. And Qwen-Image-2512 (generate an image based on a text prompt) improves the human realism, natural details and text rendering of the previous version.
GLiNERv2 is a text classification, entity extraction and structured data model all in one. For example, you can put in a string of text such as “Daniel Bourke is a Machine Learning Engineer who builds Nutrify, an app for helping people learn about whole foods” and tell it to extract person , job , company and it will output Daniel Bourke, Machine Learning Engineer, Nutrify. The models are fast and are able to run on CPU with GPU usage offering even more speedups. See the example below for usage.

from gliner2 import GLiNER2

# Load model once, use everywhere
extractor = GLiNER2.from_pretrained("fastino/gliner2-base-v1")

# Extract entities in one line
text = "Apple CEO Tim Cook announced iPhone 15 in Cupertino yesterday."
result = extractor.extract_entities(text, ["company", "person", "product", "location"])

print(result)
# {'entities': {'company': ['Apple'], 'person': ['Tim Cook'], 'product': ['iPhone 15'], 'location': ['Cupertino']}}

Encoder-only Mask Transfer (EoMT) is a plain Vision Transformer (ViT) capable of producing segmentation masks at encoder-only speeds. Typically with a segmentation model you might have multiple different components to produce output segmentation masks. However, since EoMT only uses the ViT backbone to produce output classes and masks, it can achieve similar results to more complex models with a much faster inference time (up to 4x faster). Read the paper, get the models on Hugging Face.

eomt-demo-graphic

Example of using EoMT weights from Hugging Face to generate masks on an image of food items. Food image generated with Gemini based on available food classes in the COCO dataset. The bottom figure is from the EoMT paper.

PleAIs releases Baguettotron, Monad Small LLMs and SYNTH a synthetic dataset trained with seed documents. Baguettotron is a 321M parameter model and Monad is a 50M parameter language model. Each performs incredibly well for its size compared to other models such as Gemma-3-270M and Qwen-3-600M. They were both trained on the SYNTH dataset which is a reasoning focused dataset of 200B tokens created from 50,000 Wikipedia documents as seed inputs. This is a really exciting direction for SLMs (small language models), these are the kinds of models that could be everywhere doing specific tasks with minimal compute footprint. Read the blog announcement for more.
Ai2 release Olmo3 and Olmo3.1 LLM models. These models are on par with Qwen3-32B-Instruct but are completely open from data to training code and methodologies.
Mistral release the Mistral 3 series of models. These multi-modal (image and text) models range from 3B, 8B, 14B (Ministral 3) and 675B parameters (Mistral Large 3). Mistral Large 3 performs on par or better than other large open-source models such as Kimi-K2 and DeepSeek 3.1. And the smaller models perform on par or better than the similar-sized Qwen3-VL variants. Notably the Ministral3 8B Instruct model outperforms the larger Gemma3 12B Instruct model by a significant margin. All models are available under the Apache 2.0 license.
Google release FunctionGemma-270M, a model designed to be fine-tuned to call specific functions. For example, you could imagine an in-car assistant which is designed to change settings in a car, you could tell it to “adjust the air conditioner to 22C” and the FunctionGemma-270M model could call the “adjust_air_conditioner” function.
NVIDIA release Nemotron 3 Nano, a 31.6B total parameter with ~3.6B active parameter LLM model which has best-in-class reasoning accuracy and is 4x faster than Nemotron Nano 2 as well as up to 3.3x faster than models such as Qwen3-30B-A3B-Thinking-2507. The model is ready for deployment under commercial settings thanks to NVIDIA’s open model license. Read the blog post for more information.
Amazon release Chronos-2, a universal time series forecasting model. The model is capable of using in-context learning for solving forecasting tasks with an unlimited number of dimensions in a zero-shot manner. Read the blog post announcement for more information.
ByteDance releases Dolphin-v2 for advanced document parsing. The model is capable of extracting 21 element categories from a document including section headings, paragraphs, figures, captions, lists, watermarks, figures and more. The document extraction happens in two stages, classification and layout analysis followed by specific content parsing (this two stage process enables targeted extraction for photographed and digital style documents).

Research

VL-JEPA explores the use of Joint Embedding Predictive Architecture (JEPA) for the vision-language space. Results show that VL-JEPA gets similar or better results on video classification and retrieval datasets when compared to SigLIP2 and Perception Encoder.
The Qwen3-VL technical report was released and its a treasure trail of tidbits on how to train a world-class open-source VLM.
AnyUp is a way to scale up the features of any vision feature at any resolution. It can run at inference time and does not require fine-tuning for a specific encoder. See the example notebook to run the demo on your own images.

anyup-feature-examples

Example of using AnyUp to upscale the output features of a DINOv2 model.

Releases

Google Colab AI library now available to all users (note: I’m not sure whether this will use your inputs for training or not, just beware if you’re doing anything sensitive).
Google Releases Gemini 3 Flash preview as the perfect bridge between speed and world-class model performance.

Videos

Demis Hassabis (CEO of Google DeepMind) speaks with Sarah Fry about 2025’s AI progress, Jagged Intelligence, the AI bubble and what’s next in AI.

See you next month!

What a massive month for the ML world in December and 2025 as well! I'm excited for 2026 and wish you all a Happy New Year.

As always, let me know if there's anything you think should be included in a future post.

In the meantime, keep learning, keep creating, keep dancing.

See you next month,

Daniel

www.mrdbourke.com | YouTube

By the way, I'm also an instructor with Zero To Mastery Academy teaching people Machine Learning & AI in the most efficient way possible. You can see a few of our courses below or check out all Zero To Mastery courses.