[February 2026] AI & Machine Learning Monthly Newsletter 🤖

74th issue! If you missed them, you can read the previous issues of my monthly A.I. & Machine Learning newsletter here.

Hey everyone!

Daniel here, I’m a machine learning engineer who teaches the following beginner-friendly machine learning courses:

I also write regularly about machine learning on my own blog as well as make videos on the topic on YouTube.

Since there's a lot going on, the utmost care has been taken to keep things to the point.

Here's what you might have missed in February 2026 as an A.I. & Machine Learning Engineer... let's get you caught up!

My Work

Sunny - Kaggle Competition Entry — My brother and I entered the MedGemma Impact Challenge Kaggle Competition. Our entry was called Sunny, an iOS application which uses a fine-tuned version of MedGemma to help privately track skin health over time. All of our code and models are open-source and you can see the overview video on YouTube.

Three new tutorials on learnhuggingface.com — There are three new tutorials on learnhuggingface.com: LLM fine-tuning, VLM fine-tuning and multimodal RAG (retrieval augmented generation with text and images). Stay tuned for the videos/courses to launch on ZTM!
RTX 4090 and DGX Spark Benchmarking video — I made a video comparing the NVIDIA DGX Spark to the RTX 4090 (the GPU in my deep learning PC) across various every day AI tasks such as LLM fine-tuning, object detection model training and LLM inference. In summary, the RTX 4090 has much more raw compute power + memory bandwidth, however, the DGX Spark has a much higher memory capacity.
Upcoming talk on small LLMs — I’m doing a talk at the Queensland AI Meetup (my home state) on the power and potential of Small Language Models (SLMs) on March 12 2026. It’ll be in person, however, I’ll be sure to record it and post it on my YouTube channel.

From The Internet

An AI agent coding skeptic tries AI agent coding, in excessive detail by Max Woolf. Max Woolf, a data scientist at BuzzFeed and long-time agent skeptic, puts Claude Code and Codex through increasingly ambitious tasks, from API scrapers to porting scikit-learn to Rust. His conclusion: agents work best when you have approximate knowledge of many things with enough domain expertise to know what should and should not work. A highly detailed and honest read.

The main lesson I learnt from working on these projects is that agents work best when you have approximate knowledge of many things with enough domain expertise to know what should and should not work. Opus 4.5 is good enough to let me finally do side projects where I know precisely what I want but not necessarily how to implement it. These specific projects aren’t the Next Big Thing™ that justifies the existence of an industry taking billions of dollars in venture capital, but they make my life better and since they are open-sourced, hopefully they make someone else’s life better. However, I still wanted to push agents to do more impactful things in an area that might be more worth it.

Context Engineering for Coding Agents by Birgitta Böckeler via Martin Fowler. Birgitta Böckeler explores how the context you provide to coding agents (files, instructions, tool access) matters as much as the model itself. A useful framing for anyone working with agentic coding tools.

ml-monthly-february-2026-images.001

Breakdown of what goes into the context window of a coding agent. Also a note on the illusion of control. Even if you give your agent plenty of context, it’s still not guaranteed to output the correct thing. Source: martinfowler.com.

Data is your only moat. A reminder that while models are commoditizing rapidly, proprietary high-quality data remains the most durable competitive advantage for AI-driven businesses.
Waymo introduces the Waymo World Model built on Genie 3. Waymo built a generative simulation platform on top of Google DeepMind’s Genie 3 that creates hyper-realistic driving scenarios, including rare events like tornadoes, floods and animals on the road, that their fleet has never encountered. The system generates both camera and lidar data, enabling billions of virtual testing miles before real-world deployment.

ml-monthly-february-2026-images.002

Examples of Waymo’s World Model powered by Genie 3 and converted into Waymo-style vision and Lidar data. Having the Genie 3 World Model allows you to create scenes that otherwise would rarely happen and be hard to gather data for. Top left: Elephant on the road, top right: flooding in a local suburb, bottom right: person in a T-rex costume running on the street, bottom right: changing the time of day for the same scenario.

remotelabor.ai finds that most modern AI systems get about floor-level results on real-world tasks. A sobering benchmark showing that current AI agents still struggle significantly when confronted with the messiness of actual work tasks (Claude Opus 4.5 = ~3.75% completion rate, GPT-5.2 = ~2.5% completion rate), despite strong performance on curated benchmarks.
PaperBanana is a method for enhancing Nano Banana’s generation capabilities for automatic academic illustration creation. Shows an enhancement pipeline for generating publication-quality academic figures and illustrations. Useful for researchers looking to speed up the visual side of paper writing.

ml-monthly-february-2026-images.003

PaperBanana workflow for iteratively improving an academic image.

Google introduces the Developer Knowledge API and MCP server. Google now offers a Model Context Protocol server that lets AI agents access Google’s developer documentation directly. This is part of a broader trend of making structured knowledge available to agents via MCP.

Hugging Face Roundup

Hugging Face and Numina share QED-Nano: teaching a tiny model to prove hard theorems. A 4B parameter model (starting with Qwen3-4B-Thinking as the base) trained via supervised fine-tuning and reinforcement learning that can write Olympiad-level mathematical proofs, performing on par with models orders of magnitude larger such as Gemini 3 Pro. A great example of what focused post-training can achieve at small scale. See the model and code.

ml-monthly-february-2026-images.004

QED-Nano shows that with the right data mixture and training recipe, you can get outstanding results with much lower parameters. Right: The training recipe involves RL on long chains-of-thought for creating math proofs. These chains of thought are generated by a larger model and then distilled into the smaller one.

Hugging Face and Meta release OpenEnv. An open ecosystem for embodied AI research, providing standardized environments for training and evaluating agents in simulated physical worlds.
Train models with Unsloth and Hugging Face Jobs for free. Hugging Face Jobs now supports Unsloth for efficient model training. You can fine-tune models at no cost using the current free credit promotions (available at the time of writing). See the Jobs documentation for getting started.
Great example workflow: bootstrapping a model from zero labels. Daniel van Strien

And The National Library of Scotland shows how to go from no labels to pseudolabels to a trained model with high performance using open-source models. An excellent example of iterative bootstrapping with VLMs and detection models. For this specific example, the goal was to detect bounding boxes on library index cards but could be expanded to many other workflows.

ml-monthly-february-2026-images.005

Step by step iterative way to create a custom model on a custom dataset. If you have the data, you can bootstrap the labels with a large model such as SAM3 and then train a smaller model to reproduce those labels.

Lance database joins Hugging Face. Lance, the columnar data format optimized for multimodal (text, image, video + more) ML workloads, is now natively supported on the Hugging Face Hub for hosting datasets.
Benefits of using Hugging Face Storage. A rundown of the practical benefits of using Hugging Face’s storage infrastructure for model and dataset hosting, including versioning, access control and integration with the broader ecosystem.
Gradio introduces gr.HTML for one-shot web apps. A new Gradio component that lets you build and deploy full web applications from a single HTML block. Great for rapid prototyping of AI-powered demos.
Community evals for models from Hugging Face. Hugging Face introduces community-driven evaluation datasets, making it easier for anyone to benchmark models on custom tasks. See the olmOCR-bench example for a reference implementation.
Upskill: teaching open models to write CUDA kernels. Hugging Face demonstrates how to use “skills” to teach open-source models specialized capabilities, like writing CUDA kernels, that they were not originally trained on.

Mooncake and PyTorch

Mooncake joins PyTorch ecosystem. Mooncake, a distributed serving framework for large models, is now an official member of the PyTorch ecosystem, making it easier to deploy and serve large models at scale.

Open Source

Open-Source LLMs and VLMs

Qwen3.5 series. Alibaba’s Qwen team released the Qwen3.5 family in February, headlined by a 397B parameter MoE model with only 17B active parameters. The flagship model is natively multimodal, supports 201 languages, and features a hybrid Gated Delta Networks plus MoE architecture that delivers 8.6x faster decoding than Qwen3-Max. Follow-up medium model releases include Qwen3.5-35B-A3B, Qwen3.5-122B-A10B, and Qwen3.5-27B show incredible improvements to the previous generations (similar performance with up to 10x less parameters). See them on Unsloth for efficient fine-tuning.
MiniMax M2.5. MiniMax releases their latest model with strong coding and agentic capabilities. The model demonstrates the ability to plan like a software architect, writing spec documents before generating code. MiniMax reports that M2.5-generated code accounts for 80% of newly committed code within their own company.
Ovis2.6 drops with an MoE architecture. A 30B total, 3B active parameter VLM using a Mixture of Experts architecture. Another entry in the efficient-VLM space. Strong performance over similar-sized models, however, might be a bit under the radar with Qwen3.5 release.
Photoroom open-sources their image generation model. Photoroom shares both the model weights and the journey it took to build a production-quality text-to-image model, including the training decisions and tradeoffs involved.

Speech and Audio

Mistral releases Voxtral Mini 4B for realtime speech to text. A compact 4B parameter model designed for realtime streaming speech-to-text, offering low-latency transcription suitable for production use cases.
Moonshine Speech to Text models get streaming capabilities. The Moonshine ASR models have been updated with streaming support. They claim faster and higher quality transcription than Whisper at lower parameter counts.

Embeddings and Retrieval

Perplexity releases open-source diffusion-based embedding models (pplx-embed). Perplexity enters the embedding space with pplx-embed-v1 and pplx-embed-context-v1, available at 0.6B and 4B parameter scales. The models use a novel approach: they take Qwen3 base models and convert them from causal decoders into bidirectional encoders through diffusion-based pretraining. They lead multiple public benchmarks including MTEB and ConTEB, require no instruction prefixes, and ship with native INT8 and binary quantization for up to 32x storage compression. MIT licensed. See the paper.
NVIDIA releases ColEmbed V2 for multimodal RAG systems. An updated embedding model from NVIDIA designed specifically for multimodal retrieval-augmented generation pipelines.
mmBERT for multilingual text classification. A high-quality text encoder from Johns Hopkins that supports text classification across multiple languages. Worth evaluating if you need a lightweight and fast multilingual classifier.

Computer Vision

DEIMv2: combine DINOv3 with object detection. A detection framework that pairs DINOv3 vision features with efficient detection heads. See the models and example notebook.
BioCLIP 2.5. An updated version of BioCLIP for zero-shot biological and ecological image classification.
OpenVision3 models. A new collection of open vision models from UCSC which unify vision encoding for understanding and generation into one backbone. See the paper for architecture details and benchmarks.

Specialized Models

[Case Study] Distil Labs fine-tunes FunctionGemma as a home assistant. A practical demonstration of fine-tuning Google’s FunctionGemma (270M parameters) for home automation function calling. Trained on data distilled from gpt-oss-120b, the fine-tuned model even outperforms its own teacher. A similar workflow could be adapted for in-car or other specialized settings.
Fine-tuning FunctionGemma with JAX on Google TPUs. Google’s guide to fine-tuning FunctionGemma using Tunix and JAX on TPU infrastructure. See examples of FunctionGemma running on-device in Google’s AI Edge Gallery apps.

Papers

OneVision-Encoder: a multimodal vision encoder for VLMs. A codec-aligned sparse vision encoder that outperforms both Qwen3-ViT and SigLIP2 across various video-based benchmarks. A strong candidate if you are building a custom VLM and need a high-quality vision backbone. See the model collection.
Open-WIPER for removing objects in video. A method for cleanly removing objects from video sequences while maintaining temporal consistency.
DataChef: cooking up optimal data recipes for LLM adaptation via reinforcement learning. A recipe for training an LLM to be able to create the best data mixture for adapting other LLMs to downstream tasks, rather than relying on manual data curation.
SAM3-LiteText: reducing SAM3 text encoder by 88% while maintaining performance. By replacing SAM3’s text encoder with a MobileCLIP student, the model retains original performance with a fraction of the parameters. See the models and code.
Can you use LLMs for answering customer surveys?. Research showing that with the right demographic-shaping prompts, LLMs can potentially simulate survey responses that approximate real human distributions. An interesting application for market research and user studies.
Token-level filtering for shaping LLM capabilities. A technique for selectively filtering training tokens to shape model behavior, offering finer-grained control over what capabilities a model develops during training.
Think Deep, Not Just Long: measuring LLM reasoning effort via deep-thinking tokens. A paper that proposes measuring reasoning quality by the depth of thinking tokens rather than simply counting them, suggesting that more deliberate reasoning steps matter more than longer chains of thought.

Releases

Apple adds agentic coding to Xcode 26.3. Apple shipped agentic coding in Xcode 26.3, with direct support for coding agents such as OpenAI Codex and Claude Agent. The practical win is that Xcode can now let those agents build and test projects, search Apple documentation, and handle more complex multi-step coding tasks through MCP-based tooling.
Google releases Gemini 3 Deep Think upgrade. A major upgrade to Google’s specialized reasoning mode, developed in partnership with scientists and researchers. The updated Deep Think achieved 48.4% on Humanity’s Last Exam (without tools), 84.6% on ARC-AGI-2.
Google releases Gemini 3.1 Pro. A point-version update to Gemini 3 Pro that more than doubles the ARC-AGI-2 reasoning score to 77.1%, introduces a three-tier thinking system, and adds the ability to generate animated SVGs from text prompts. Available via the Gemini API, Antigravity and Vertex AI.
Google releases Nano Banana 2 built on Gemini 3.1 Flash. The next generation of Google’s on-device image generation model, now built on Gemini 3.1 Flash (note: Gemini 3.1 Flash itself is not available yet… but this might mean it might be available soon?).
Gemini adds multimodal tool calling. Gemini models can now invoke tools based on multimodal inputs (text + images), expanding the range of agentic workflows possible with the updated Gemini interactions API.
Anthropic releases Claude Sonnet 4.6. Anthropic’s most capable Sonnet model, now the default on claude.ai. The model brings a 1M token context window (beta), improved coding, computer use, and agent planning. Performance that previously required an Opus-class model is now available at Sonnet pricing ($3/$15 per million tokens). Released February 17, just 12 days after Opus 4.6.
Google releases CodeWiki, a Gemini-powered documentation creator. A tool that automatically generates and maintains up-to-date documentation for codebases using Gemini.
Project Genie. Google DeepMind expands access to Genie, their world model for generating interactive 3D environments from text prompts (see above for how Genie 3 is being used with Waymo to create world models for self-driving cars).
Oumi and Lambda partner for end-to-end custom model development. Oumi’s open-source training framework combined with Lambda’s GPU cloud, making custom model development more accessible from data preparation to deployment.
Ai2 introduces MolmoSpaces, an open ecosystem for embodied AI. Allen Institute for AI launches an open platform for building and evaluating AI agents that can interact with physical environments.

Videos

Ashok on building foundational models at Tesla. A behind-the-scenes look at Tesla’s approach to training foundation models for autonomous driving and robotics. Very cool to see how seriously they take safety and how sophisticated their simulations are.
Yann LeCun on World Models. Yann LeCun discusses his vision for world models and how they might bridge the gap between current AI capabilities and more general intelligence.
Nathan Limbach and Sebastian Raschka on Lex Fridman. A deep conversation covering LLM training, open-source model development and the state of ML research. It’s a long (but excellent) one. I listened to this one over the course of 3-4 days walking to and from training.
Elon Musk on Cheeky Pint with Dwarkesh. A wide-ranging interview covering AI, Tesla, xAI and the broader tech landscape.
PewDiePie on fine-tuning LLMs. PewDiePie shares his rollercoaster ride of a journey figuring out how to train a coding model. From dataset gathering to hardware malfunctions to finally… (I won’t spoil it :P). All in all a very fun and inspiring story.

See you next month!

What a massive month for the ML world in February!

As always, let me know if there's anything you think should be included in a future post.

Liked something here? Share it with someone.

In the meantime, keep learning, keep creating, keep dancing.

See you next month,

Daniel

www.mrdbourke.com | YouTube

By the way, I'm also an instructor with Zero To Mastery Academy teaching people Machine Learning & AI in the most efficient way possible. You can see a few of our courses below or check out all Zero To Mastery courses.