78th issue! If you missed them, you can read the previous issues of my monthly A.I. & Machine Learning newsletter here.
Hey everyone!
Daniel here, I’m a machine learning engineer who teaches the following beginner-friendly machine learning courses:
- Complete Machine Learning and Data Science Bootcamp: Zero to Mastery
- TensorFlow for Deep Learning: Zero to Mastery
- PyTorch for Deep Learning: Zero to Mastery
- [NEW] 🤗 Machine Learning with Hugging Face Bootcamp: Zero to Mastery
I also write regularly about machine learning on my own blog as well as make videos on the topic on YouTube.
Since there's a lot going on, the utmost care has been taken to keep things to the point.
Here's what you might have missed in June 2026 as an A.I. & Machine Learning Engineer... let's get you caught up!
My Work
Hey all, a bit of a shorter issue this month as I’m writing this on June 25th rather than the end of the month. This is because I’m going away for most of July on a honeymoon with my wife.
That being said, there will be no July 2026 issue (I’ve promised my wife I won’t be reading ML papers while we’re away :P haha). Anyway, have an awesome July and I’ll be back to normal writing in August 2026!
- The Fine-tuning Small Language Models with Hugging Face Transformers video course is live on ZTM!!! - The Small Language Model (or SLM) fine-tuning tutorial series is finished editing and is officially available on ZTM. Inside, you’ll learn how to fine-tune SLMs to do specific tasks and what datasets you’ll need to get started on your own language modelling problems. You’ll finish the course having trained a SLM of your own, uploaded it to Hugging Face and created a demo app that you can show others.
From The Internet
- Fine-tuning a model for free from one prompt, with TRL and the Google Colab CLI - Sergio Paniego points a coding agent at the TRL repo and Google’s new Colab CLI, then walks away while it provisions a free cloud GPU, runs QLoRA, streams metrics to a trackio dashboard, and pushes the trained adapter to the Hub.
- MTEB Leaderboard v3: From a slow demo to feature-rich leaderboard - The embedding benchmark got rebuilt on FastAPI and Svelte for a far faster experience, with richer filtering, zero-shot annotations, and head-to-head model comparison so you can find the right embedder for your use case.
- Why AI hasn’t replaced software engineers, and won’t - Arvind Narayanan and Sayash Kapoor argue that writing code was never the bottleneck, so AI compresses only the middle of the “decide, execute, deliver” sandwich while the human ends resist automation.
“This pattern — where humans remain heavily involved at both ends of the decide-execute-deliver sandwich, even as AI increasingly automates the middle layer, seems to be broadly applicable to most knowledge work, though it is farthest along in software. After all, complex decision making and accountability are common to most fields. A lack of recognition of this phenomenon has led to many overconfident claims about imminent job losses, such predictions about AI replacing radiologists.”
- Addy Osmani, Director at Google Cloud AI, has a fantastic blog, the following three pieces worth reading together:
- AI Agent Skills - Skills encode the senior-engineer work that never shows up in the diff (specs, tests, reviews, scope discipline) as agent-actionable workflows with anti-rationalization tables that stop an agent talking itself out of the steps. Skills don’t have to be overcomplicated either. They can be a markdown (
.md) file at the root of your repo that the agent reads when it does work. And you can update this file over time. - AI Harness Engineering - A coding agent is the model plus its harness (prompts, tools, hooks, sandboxes, feedback loops), and most agent failures are configuration problems you can fix today rather than model problems you wait out. So instead of waiting for Model 47.5 to get better, you update the harness around the model you have.
- The new software development life cycle with vibe coding - Osmani’s companion notes to a Google whitepaper put the split at roughly 10% model and 90% harness, with specification quality and verification becoming the real bottlenecks once implementation collapses to minutes.
- AI Agent Skills - Skills encode the senior-engineer work that never shows up in the diff (specs, tests, reviews, scope discipline) as agent-actionable workflows with anti-rationalization tables that stop an agent talking itself out of the steps. Skills don’t have to be overcomplicated either. They can be a markdown (
- AGENTS.md - An open, README-for-agents format now used by 60k+ projects to give coding agents a predictable place for build steps, tests, and conventions. OpenAI’s Codex guide uses the AGENTS.md format.
- Semantic IDs for grocery products at Instacart - Semantic IDs are small codes built on top of embeddings which are supposed to cluster similar items together. The idea is when someone orders parmesan cheese, you don’t necessarily want to recommend them more parmesan cheeses, instead they may be interested in things like olives or olive oil or bread or salami, things to make a pasta or a cheese platter. The Semantic IDs encode similar items in the first few integers in the code and then they progressively get more detailed as you drill down. When Instacart launched generative Semantic IDs, they saw +34% add-to-carts as well as 2.7x more emerging brand products (longer tail items rather than just the most popular).
- Eugene Yan’s writing is world class, three recent posts I enjoyed:
- Training an LLM-RecSys Hybrid for Steerable Recs with Semantic IDs - Eugene turns item semantic IDs into vocabulary tokens for a fine-tuned Qwen3-8B, producing a “bilingual” model that recommends from the catalog and can be steered, reason about its picks, and even name bundles through plain chat.
- Product Evals in Three Simple Steps - Build product evals by labelling a small balanced dataset, aligning one binary LLM-evaluator per dimension, and running an eval harness on every config change to tighten the feedback loop.
- How to Work and Compound with AI - Five principles for compounding work with AI: treat context as infrastructure, encode your taste as config, make verification cheap, delegate bigger chunks, and close the loop so each correction improves the next session.
- Case Study Training your own encoder-free VLM for $43 (similar to Gemma 4) - Hugging Face M4 walks through training a Gemma 4 style encoder-free VLM for around $100 on a single H100, swapping the vision encoder for a lightweight patch embedder that the language model learns to read directly.
Open Source
- Comma AI driver-monitoring update (openpilot 0.11.1) - Comma retrained their phone-detection driver-monitoring model using ground-truth labels generated from a locally hosted VLM’s yes/no token probabilities rather than a hand-labeled classifier, cutting false positives significantly in the field.
- Boogu release Boogu-Image-0.1-Turbo - An Apache-2.0 unified image generation and editing model family (Base, Turbo, Edit) that aims for quality competitive with Nano Banana Pro and GPT-Image-2 on a fraction of the training compute.
- Datalab release Lift - Lift is a 9B model that pulls schema-constrained structured JSON out of any PDF or image you hand it, reaching about 90% field accuracy while running locally.
- Stanford release GPIC - GPIC is a permissively licensed 100M-image corpus (around 28 trillion pixels) captioned by a VLM for training text-to-image models, with safety filtering, dedup, and a benchmarking protocol included. Labelled 100M images with Qwen3-VL-4B-Instruct tags/short captions/medium captions/long captions as they found it to be the best model for a speed/quality tradeoff. See the paper for full captioning prompts. All images permissively licensed. Website, Paper.
- Z.ai release GLM-5.2 - An MIT-licensed flagship model built for long-horizon tasks on a solid 1M-token context, landing as the top open-source model on several long-horizon coding benchmarks and within a few points of Claude Opus 4.8. GLM 5.2 blog from Z.ai.
- Poolside release Laguna M.1 - A 225B-parameter Mixture-of-Experts coding model (23B active per token) under Apache 2.0, built for agentic coding and long-horizon work with a 262K context.
- Microsoft release FastContext - A lightweight repository-exploration subagent that issues parallel read-only searches and returns compact file and line citations, cutting a main coding agent’s token use by up to 60% while improving resolution rates.
- Doclang launches as the open-source AI-native document language - An open standard and reference validator for AI-native documents, now stewarded as a working group under the LF AI & Data Foundation. Website, Spec, Announcement.
- Zyphra release Zonos 2 - An Apache-2.0 Mixture-of-Experts text-to-speech model trained on more than 6 million hours of multilingual speech for expressive, low-latency, high-fidelity voice cloning. Blog post.
- MiniMax release MiniMax-M3 - An open-weight foundation model pairing frontier agentic-coding performance with a 1M-token context and native multimodality, built on MiniMax’s Sparse Attention (MSA) architecture.
- Supra release Supra-50M-Instruct - A tiny 50M-parameter Llama-style instruct model trained from scratch on 20 billion tokens of educational web text that holds its own on a few benchmarks. Perfect for experimenting with a very Small Language Model.
- Cohere release North Mini Code - An Apache-2.0 30B Mixture-of-Experts model (3B active) optimized for code generation, agentic software engineering, and terminal tasks with a 256K context.
- Ideogram release Ideogram 4 as open source - Ideogram’s first open-weight text-to-image model, a 9.3B from-scratch DiT with best-in-class text rendering plus structured JSON control over layout, bounding boxes, and color palette.
- Liquid release LFM2.5-VL-450M-Extract - A 450M edge VLM (SigLIP2 backbone) that extracts user-defined fields from images as JSON, outperforming similar sub-1B models on a structured-extraction benchmark. I liked how they used SigLIP2 100M as the vision backbone.
- Google release Magenta RealTime 2 - An open music model for on-device, low-latency (around 200ms) real-time music generation steered live by text prompts, audio examples, and MIDI.
- NVIDIA launch Nemotron Ultra, 550B parameters - A 550B-parameter (55B active) LatentMoE model with a Mamba-2, MoE, and attention hybrid architecture aimed at frontier reasoning and long-running agentic workflows. The model performs well on benchmarks and is significantly faster than other models in its class. Blog.
- Google release Gemma 4 12B - A unified, encoder-free multimodal model that feeds vision and audio straight into the LLM backbone and runs locally with 16GB of memory while approaching the larger 26B MoE on benchmarks. Tech blog, Developer guide, Hugging Face weights.
- Google releases DiffusionGemma for 4x faster text generation - An experimental 26B MoE (3.8B active) that drafts 256-token blocks in parallel via text diffusion for up to 4x faster local inference, trading some quality for speed on interactive workflows.
- Google releases Gemma 4 QAT models - Quantization-aware-training checkpoints that shrink Gemma 4 (they managed to get the E2B text model down to about 1GB of memory requirements only) for mobile and consumer GPUs while preserving quality better than post-training quantization.
- KRLabsOrg release verbatim-rag-modern-bert-v2 - A ModernBERT token classifier that highlights the verbatim source spans answering a query, so RAG responses stay grounded with exact citations and the pipeline can even run without an LLM. GitHub.
- NVIDIA release Cosmos 3 - The first open omni-model for physical AI, unifying world video generation, physical reasoning, and action prediction in a single Mixture-of-Transformers model for robotics and autonomous systems.
- JetBrains release Mellum2 - An Apache-2.0 12B Mixture-of-Experts model (2.5B active) scoped for fast, latency-sensitive jobs like routing, RAG, summarization, and coding sub-agents rather than every task in the stack.
Research
- A Bitter Lesson for Data Filtering - Mohri, Duchi, and Hashimoto run scaling studies showing that with enough compute the best data filter is no data filter at all. Some quotes from the paper I enjoyed:
“We find that sufficiently trained large parameter models not only tolerate low-quality and distractor data, but in fact benefit from nominally ‘poor’ data.”
When talking about sufficiently large models with enough compute:
”Remarkably, these models even benefit from shuffled-word documents, despite only the unigram distribution of the documents remaining intact”
“As a result, data filtering may suffer from the bitter lesson (Sutton, 2019) in which human-designed filters that perform well at the small scale are eventually replaced by simple, no-filter approaches that scale more gracefully with compute.”
- INSID3: Training-Free In-Context Segmentation with DINOv3 - A CVPR 2026 method that does in-context segmentation entirely inside a frozen DINOv3 backbone, with the key trick of identifying and projecting out a positional bias in DINOv3’s features. Paper.
NSID3 allows similar objects/items in images to be segmented via semantic selection:
- Code as Agent Harness - A survey reframing code as the operational substrate of agent infrastructure, organized across the harness interface, harness mechanisms, and scaling from single-agent to multi-agent systems.
- Google Research for passive heart rate monitoring with front-facing camera - Google’s PHRM system passively estimates heart rate and daily resting heart rate from short front-camera face videos during everyday phone use, matching wearable accuracy across all skin tones.
Releases
- Improving health intelligence in ChatGPT (GPT-5.5 Instant) - GPT-5.5 Instant brings frontier-level health responses to all free ChatGPT users, performing comparably to OpenAI’s Thinking models on hard, physician-written health evaluations.
- OpenAI release LifeSciBench - An expert-authored, expert-reviewed benchmark of 750 real-world life-science research tasks spanning seven workflows and seven biological domains, graded against detailed scientific rubrics.
- Mistral OCR 4 - A SOTA document model that returns extracted text alongside bounding boxes, typed-block classification, and inline confidence scores across 170 languages, deployable in a single self-hosted container.
- OpenRouter releases Fusion - Fusion sends one prompt to a panel of models and has a judge model fuse their outputs, letting even a budget panel beat frontier models like GPT-5.5 and Opus 4.8 on deep-research tasks.
- Apple hosted WWDC 2026 - In the keynote Apple introduced a new LLM-powered Siri. The new Siri is powered by Apple’s 3rd Generation of Foundation Models. The new Foundation Models family includes five models co-built with Google, including a 20B sparse on-device model stored in flash memory that activates only 1 to 4 billion parameters per request.
- Microsoft AI release 7 new models - Microsoft AI launched a family of seven in-house MAI models spanning reasoning, coding, image, voice, and transcription, all trained from scratch on clean, traceable data. MAI Code 1 Flash, now available in Copilot.
Videos
- Inside Apple Intelligence and Xcode: Special Presentation (WWDC26) - Apple’s developer-focused session digs into the latest Apple Intelligence and Xcode features, including the new agentic coding capabilities. I really enjoyed this keynote, was good to see people presenting live again. The new stuff they showed off with MLX was especially impressive.
- Run local agentic AI on your Mac with MLX - An Apple WWDC session on running AI agents entirely on-device with MLX, including wiring code agents like OpenCode into Xcode and scaling across multiple Macs.
- The Art & Science of Benchmarking Agents - Vincent Chen of Snorkel AI argues that good benchmarks are bets on where capabilities are heading, drawing a framework from reviewing 120+ grant applications on task quality, diversity, model headroom, and eval methodology. I’ve seen this first hand at Artificial Analysis. If you want a model to improve on something, create a benchmark for it, people love seeing the number go up.
- Microsoft Build event in 25 minutes - The Verge’s fast recap of Microsoft Build 2026, covering agent-first Windows updates, the Surface RTX Spark Dev Box, the 7 (!) new MAI models, and the Project Solara on-device agent.
- VibeThinker 3B - Taking on Giant Models - Sam Witteveen takes a look at WeiboAI’s 3B reasoning model that matches models hundreds of times larger on verifiable math, coding, and STEM benchmarks.
See you next month!
What a massive month for the ML world in June!
As always, let me know if there's anything you think should be included in a future post.
Liked something here? Share it with someone.
In the meantime, keep learning, keep creating, keep dancing.
See you in August,
Daniel
P.S. Just a reminder, there will be no July 2026 issue (I'm away on my honeymoon!). Have an awesome July and I’ll be back to give you an extra awesome issue in August 2026!
By the way, I'm also an instructor with Zero To Mastery Academy teaching people Machine Learning & AI in the most efficient way possible. You can see a few of our courses below or check out all Zero To Mastery courses.











