Use Code: BFCM24 to get 58% OFF your annual membership. Expires soon 👇

Machine Learning Monthly Newsletter 💻🤖

Daniel Bourke
Daniel Bourke
hero image

22nd (special) issue! If you missed them, you can read the previous issues of the Machine Learning Monthly newsletter here.

Daniel here, I'm 50% of the instructors behind Zero To Mastery's Machine Learning and Data Science Bootcamp course and our new TensorFlow for Deep Learning course! I also write regularly about machine learning and on my own blog as well as make videos on the topic on YouTube.

Welcome to this special edition of Machine Learning Monthly. A 500ish (+/-1000ish, usually +) word post detailing some of the most interesting things on machine learning I've found in the last month.

Why is this a special edition? This month's Machine Learning Monthly Newsletter is easily the longest newsletters yet. We're spending 100% of this month's edition on diving deep into the State of AI Report 2021. It's possibly the largest AI report to come out every year and so I've made a summary of my favourite bits (and there's lots of them).

What you missed in October as a Machine Learning Engineer…

My work 👇

Charlie Walks: A Novel

Working as a machine learning engineer for the largest tech company in the world but Charlie wants to be a writer. So during the day he writes code and in the evenings he writes words, letters to his nephew Pauly about what he's discovered in his world-generating computer program XK-1.

Nutrify: A machine learning project

I've started working on a full-stack machine learning project to take a photo of food and learn about it. This combines my interests of health and nutrition and machine learning. My goal is to build a data flywheel (collect data, model, improve model, collect data, repeat). Stay tuned for video updates on my YouTube.

Special edition: State of AI Report 2021 Summary 🚩

Although vast, the State of AI Report collects a snapshot of what’s happening in the field of artificial intelligence (AI) every year.

I say snapshot because there’s a lot going on. Even with over 180 slides, there’s still much more the report missed out on.

In saying that, this summary is a summary of a summary.

So there are many things I’ve missed out from the State of AI Report 2021 and the following points are biased by what interests me most.

If you’d like to read the full report and past reports, visit the State of AI website.

The report breaks coverage into five categories:

  1. Research — Technology breakthroughs.
  2. Talent — Supply, demand and concentration of AI talent.
  3. Industry — Areas of commercial application for AI and its business impact.
  4. Politics — Regulation of AI, its economic implications and the emerging geopolitics of AI.
  5. Predictions — What the authors believe will happen going forward and a performance review of past predictions.

I’ve kept these sections the same.

1. Research (Slides 10 — 74)

The research section covers recent technology breakthroughs and their capabilities.

Highlights for me in this section were self-supervised learning, transformer architecture use and language models taking over, more research papers making their code available and a study showing tweeting scholarly work results in up to 3x more citations.

Self-supervised learning is the process of a model going through an unlabelled dataset and identifying underlying patterns on its own. For example, self-supervised learning is used to power large language models, a model that understands how different words interact in language. A language model might read all of Wikipedia (and more) and learn that the word ‘dog’ is more likely to appear in the sentence “The ___ jumped over the fence” than the word ‘car’.

A similar process to this technique is now being used for computer vision, except replace learning words with learning the relationship between pixels in an image.

  • Slide 12: Facebook AI creates SEER (SElf-supERvised), a billion-parameter self-supervised model trained on 1 billion random unlabelled and uncurated public Instagram images that achieves 84.2% top-1 accuracy on ImageNet.
  • Slide 13: Self-supervised Vision Transformers (SSViT) learn features that supervised models don’t. These features were found to be powerful predictors as well, achieving 78.3% top-1 accuracy using a k-NN classifier. The features learned from self-supervision contain better information about the semantic segmentation (separation of different items in an image). The findings were implemented into a method called self-DIstallation with NO labels (DINO).

175563722 2961317637477713 3077911381961439916 n DINO learns the semantic separation of a subject in an image. Source: Facebook AI Blog.

It’s clear with these findings learning from data with little to no labels is becoming more of a reality.

Transformer take over.

I wonder what propelled the transformer architecture more since the Attention Is All You Need paper, HuggingFace releasing every new variant within days or the architectures performance capabilities. Marketing or product?

  • Slide 11: Vision Transformers (ViT) achieve state of the art (SOTA) in computer vision with 90.45% top-1 accuracy on ImageNet. However combining convolutions (better inductive bias) and transformers (high ceiling on performance) leads to slightly better results with Convolution and self-Attention Networks (CoAtNets) achieving similar results with 23x less data or slightly better (90.88% top-1 accuracy) with scaled up data.
  • Slide 13: Transformer-based Conformer combines self-attention and convolutions to achieve state of the art on speech recognition (lowest word error rate on LibriSpeech benchmark). Point Transformers also achieve the state of the art for 3D point cloud classification.
  • Slide 15: DeepMind’s Perceiver and Perceiver IO architectures show just how general purpose the building blocks (attention) of the Transformer architecture can be. The Perceiver models can handle a variety of different data types such as language, audio, video, point clouds and games. A highlight is that Perceiver IO matches a Transformer-based BERT baseline on the GLUE language benchmark without the need for tokenization (learns directly from UTF-8 bytes).

Not so fast Transformers, convolutional neural networks (CNNs) and multilayer perceptrons (MLPs) are still cool. Turns out with modern training techniques and data preparation procedures, CNNs and MLPs can perform very well.

  • Slide 17: Google researchers found when pre-trained like Transformers, CNNs perform similarly or better on natural language tasks, raising the question was it the training or the architecture get the results? They also found that in MLP-Mixer, an architecture comprised of only MLPs, when applied to data prepared in the same way as vision transformers (images cut into patches) can be competitive with other more established vision architectures.
  • Bonuses: ResNet strikes back is a paper that shows modern training techniques (MixUp, CutMix, LAMB optimizer, etc) can improve a vanilla ResNet architectures results by ~4% top-1 accuracy on ImageNet. And the Patches Are All You Need? paper shows that using the same data preparation technique as vision transformers (turning images into patches), the simple ConvMixer architecture (that can fit into a tweet) outperforms ViT and MLP-Mixer.

Screen Shot 2021-11-01 at 11.21.27 pm It turns out if you prepare your data in a good way, an epic model can be built within the space of a tweet. Source: Andrej Karpathy Twitter.

Again, is it the training, the data preparation or the architecture leading to the results

DeepMind DeepMind DeepMind. DeepMind produced major research breakthroughs in biology and reinforcement learning.

  • Slide 19: AlphaFold 2 predicts protein structures based solely on its amino acid sequence, also know as the ‘protein folding problem’ with atomic accuracy, producing groundbreaking results in a research problem that’s been around for more than 50-years.
  • Slide 20: Don't have the resources of DeepMind? No worries. The University of Washington didn't either. Yet they produced a method called RoseTTA fold which performs on similar levels to AlphaFold 2 but is available open source. I love seeing this!
  • Slide 28: MuZero is a reinforcement learning algorithm matches the performance of AlphaZero on Go, Chess and Shogi all without needing to be told the rules.
  • Slide 30: A DeepMind created game environment called XLand provides many multiplayer games within consistent, human-relatable worlds that allows an agent to continually learn across different, never-seen-before environments. Because of this, the agent is able to show behaviours applicable to many tasks rather than a specialized individual task, a first in RL research.

deepmind-gameland Example of DeepMind's XLand simulator to help RL agents learn generalizable skills. Source: DeepMind blog

Language models everywhere raise the new challenge of prompt engineering.

One way to view language models is like an instrument. The instrument can play almost an infinite number of sounds but only some of them sound good. And the way to get better sounds depends on how you use (prompt) the instrument.

If you bang an instrument, you get violent sounds but if you play it in a certain way, you get symphonies.

If you give a language model a good prompt, you typically get good results (slide 46), however bad prompts can have equally poor outcomes (slide 47).

  • Slide 41: Codex is a form of the language model GPT-3 with a focus on turning natural language into working computer code. Codex is used to power apps such as GitHub Copilot.
  • Slide 45: Researchers found prompting may be better than fine-tuning for large language models in Natural Language Processing (NLP). In Pre-train, Prompt and Predict they find that prompt-based learning allows a language model (LM) to be pre-trained on massive amounts of raw text and by defining a new prompting function, the LM is able to perform few-shot or even zero-shot learning with little to labeled data. For example, trained on social media text, when recognizing emotion of “I missed the bus today”, the prompt “I felt so ____" may result in the LM filling in the blank of the emotion.

Medical models can learn more than doctors but the extra knowledge might be biased. It was found that vision models trained on medical scans can identify a patient's race better than a clinician. Is this helpful or harmful?

  • Slide 55: If an AI system can identify a patient's self-identified race from a medical scan, could this possibly end up amplifying or creating racial disparities? Learned features for race appear to come from all regions of the images, this complicates mitigation. The main issue the paper highlights is that the model happened to secretly use its knowledge of self-reported race to misclassify patients of a certain race, radiologists would not be able to tell using the same data the model has access to.

Code sharing from machine learning research improves but it could be better.

  • Slide 56: The Papers With Code website shows that 26% of AI research papers published on arXiv have code repositories available, up from 15% last year. However, checking the “hottest papers" in past the 30 days (top papers shared on Twitter until 8 September 2021) only 17% shared a code repository. And 60% of available code repos make use of PyTorch.

Screen Shot 2021-11-01 at 11.35.14 pm If there's no code attached to a machine learning paper and the results aren't reproducible, is the research valid? Source: Daniel Bourke Twitter.

  • Slide 60: Not more data. Better data. Selecting data your model trains on can a huge impact it how it performs and trains. A method called "selection via proxy” (SVP) allows 50% of CIFAR-10 (a popular image dataset benchmark) can be removed without impacting accuracy as well as speeding up training time by 40%. The SEALS (Similarity Search for Efficient Active Learning and Search of Rare Concepts) allows for web-scale active learning (e.g. ~10 billion images). It works by clustering together learned representations of labelled data and then only considering the unlabelled nearest neighbors of those learned representations in each selection round rather than scanning all unlabelled samples.
  • Slide 62: Academic institutions submit their research code more than industry.
  • Slide 63: Share your work on Twitter → get up to 3x more citations. A study composed of a 1-year randomized trial of papers tweeted or not tweeted showed tweeting scholarly work results in significantly more article citations over time. Would Wozniak's work have made it out there without Jobs? Or would anyone have heard of Jesus without St. Paul?
  • Slide 74: There's a new deep learning framework on the loose: JAX. JAX is designed to resemble NumPy except it can run on the GPU. Personally I've found it similar to PyTorch though perhaps more functional based rather than class-based (OOP). It's more research focused rather than production so far (like PyTorch wasn't for a while) but it seems likely it'll end up there.

2. Talent (Slides 75 — 91)

The talent section reports on the spread of AI talent (AI skills demand and availability) across the world.

A vast concentration of talent is in the US, China and UK (this is similar to many things worldwide). My country, Australia, doesn’t make any of the lists for supply or demand. But the beauty of ML and AI is that much of the best work can be found online. So even though I’m writing these lines from Brisbane, Australia, I very much feel part of the fun.

Countries outside US and UK are growing in AI.

Screen Shot 2021-11-01 at 11.38.28 pm Institutions and their AI publications over the years. Source: Oecd.ai

Goodbye university tenure, hello stock options.

3. Industry (Slides 92 — 152)

Put research and talent together and what do you get? Ideally, something useful. That’s what the industry seciton deals with. Where’s all the AI research going?

This year’s report features a lot in the world of drug discovery and despite the efforts from the ML community for COVID-19, many of the applications have fallen through.

AI is powering drug discovery.

  • Slide 93: Exscientia, an AI-first drug company that originated the world’s first 3 AI-designed drugs into Phase 1 human testing IPO’d on the NASDAQ at >$3B valuation. The company has a further 4 more drug candidates undergoing submission. AI helps to synthesize 10x fewer compounds to find a candidate and results in 12 month target-to-hit vs. 54 month industry average.
  • Slide 94: Allcyte’s computer vision AI helps to identify the most potent drug for each cancer patient to improve survival. It measures how live cancer cells respond to 140 different third-party anticancer drugs at the cell level and helps decide the best for a specific patient.

From electricity demand forecasting to when’s the ideal time to artificially inseminate (also call AI) a dairy cow, AI is seeing application across a wide range of industries.

  • Slide 98: Intenseye’s computer vision models help to protect employees from workplace injuries. The system is trained to detect over 35 types of employee health and safety incidents. In contrast to human safety inspectors, the system works around the clock and in real-time leading to it detecting 1.8M unsafe acts in 18 months.

unsafe-act Example of Intenseye's computer vision catching an unsafe act in the workplace. Source: Intenseye.com

  • Slide 100: Transformers shone above, and one place where they’re being used in industry is to improve electricity grid demand. The UK National Grid ESO (Eletric System Operator) combined with Open Climate Fix to use the Temporal Fusion Transformer that’s been delivering forecasts since May 2021. The system’s more than halved the mean absolute error (MAE) of a 1-hour lead time and reduced MAE of 24-hour lead time by 14%. Better prediction of electricity demand could lead to lower emissions and better grid stability.
  • Slide 101: Connectera creates an AI-powered software program and hardware device called Ida that helps to track dairy cows and is able to predict things like ideal reproduction windows, health declining, heard movement. Data is collected through a neck-worn sensor.
  • Slide 102: What does AI say about your gut? ZOE can connect good and bad bacteria to different food sources from gut bacteria metagenomic sequencing of 1,100 people. The model can predict with 0.72 AUC whether someone drinks high amounts or no coffee based on bacteria in their microbiome. The models were trained on UK data but tested on UK and US test sets.

Machine learning in production is a go but there's still troubles making it work.

  • Slide 106: Machine learning in production learnings push researchers to think more from model-centric AI to data-centric AI. This from the ML community growing increasingly aware of the importance of better data practices and more standardized MLOps (machine learning operations). There’s a few resources to help out: data centric competition by deeplearning.ai, data centric GitHub repo collecting resources and datacentricai.org, a website dedicated to detailing the challenges of data centric AI.
  • Slide 110: Despite all the efforts and many participants, much of the ML literature for COVID-19 does not reach the threshold of robustness or reproducibility required for clinical practice. 25% of papers using computer vision to detect COVID-19 and pneumonia used the same control dataset to compare adult patients without mentioning it consists of kids aged 1-5.
  • Slide 111: As tooling for data quality grows, ML teams are launching more projects and realising training datasets are no longer a fixed object but a continuously growing corpus of knowledge. Automated labeling and accessible state-of-the-art architecture availability means data quantity and quality becomes the competitive metric for AI-first startups.

COVID highlights how much we all love computers and computers need chips and chips need to be made and it turns out only a handful of companies can do so.

Business-ASML---Employees-assembling-an-EUV-system-(ASML) ASML's chip-making machine costs $150M per unit, contains 100,000 parts and ships in four shipping containers. Source: Wired magazine

  • Slide 117: Major semiconductor fabricators around the world commit ~$400B for new capabilities as the global chip market continues to grow. Intel is dedicating over a combined $100B, TSMC $100B and Samsung over $200B.
  • Slide 129: Google’s been slowly infusing AI into more of its business and consumer applications such as Gmail smart reply, an AI-based grammar checker, Google Sheets formula predictions, Maps using AR and new routing optimised for lower fuel usage and CO2 emissions as well as open-sourcing MediaPipe, a cross platform toolkit for integrating vision technology into different devices.

google-sheets-formula-predictions Google Sheets formula predictions using contextual awareness. Source: Google AI blog

  • Slide 133: ClipDrop is an app that uses AI to separate the subject of an image from the background and then allows you to drop the separated image onto your computer ready to use on a website. This is helpful for posting product pictures with clean backgrounds. Their open-source tool cleanup.pictures allows you to remove unwanted items from image by scribbling over them.

clipdrop-demo400 The ClipDrop app in action. Source: https://clipdrop.co/

  • Slides 141 to 152: There's a huge amount of investment going into AI and plenty of AI companies exiting as well (selling to bigger companies).

One thing I found strange was that there was no mention of Tesla in the self-driving section or any of the industry section. Considering how Tesla has the largest public self-driving fleet in the world and are leaders in the AI-in-production space.

I’d have liked to have seen smaller companies in the mix too.

Like Descript for using AI-powered transcriptions to edit videos and podcasts via text, PictureThis using computer vision for identifying different kinds of plants and comma.ai for creating open-source self-driving car software and shipping devices to allow people to use it.

Though these views are biased by my own interests. As much as I love big companies doing cool things, I like small companies doing cool things more.

4. Politics (Slides 153 — 181)

The politics section looks at the regulation of AI and how it should be used in an economic and ethical sense.

It’s a hard topic because a lot of what’s happening in the AI space is: 1. Moving fast and 2. Unknown, as in, some of the things people are trying to regulate, the engineers creating them aren’t even sure of how they’re doing it.

  • Slide 154: Google fired employee Dr Timnit Gebru after she tried to publish a paper on AI ethics but it didn’t go through. Some double standards here. Promoting ethical AI but firing those who take stands.
  • Slide 155: Transformative AI or AI on the scale of the industrial revolution is predicted to come around 2052. One core assumption is that if researchers are able to train a neural net or other ML model that uses similar computation levels to the human brain, that will likely result in transformative AI.
  • Slide 156: 68% of machine learning researchers surveyed believed AI safety should be prioritised more than at present. But is this an anti question? As in, you wouldn’t say the opposite of it, “AI safety should be less prioritised”. Amongst commercial players, OpenAI, DeepMind, Google and Microsoft are perceived as most likely to shape the development of AI in the public interest. However, since DeepMind is Google and OpenAI is Microsoft, there should be more players (enter Anthropic later).
  • Slide 160: DeepMind tries to gain independence from Google and become more of a non-profit, however Google blocked the move. DeepMind want to create AGI and they’re worried that this shouldn’t be in the hands of a single entity (rightly so).
  • Slide 161: Anthropic is a new AI safety company building reliable, interpretable and steerable AI systems. The new entity has raised $124M and is comprised of researchers from companies such as OpenAI. And because of this they're poised to become the third pillar of AGI research alongside DeepMind and OpenAI. They're hiring.
  • Slide 163 + 164: EleutherAI takes on the challenge of decentralizing power via open source. They’ve created a model the same size as a smaller GPT-3 with equal or better performance and made it available to all. I love love love this. Read their blog for the story of how things started out as a rogue group of hackers in a Discord in July 2020.

Screen Shot 2021-11-01 at 11.52.52 pm EleutherAI started in July 2020 through a Discord server. Since then they've done a whole bunch. Read the story on their blog.

  • Slide 168: Some entities want to regulate AI more. However, regulation in AI is proposed on many things that the scientific community doesn’t yet fully understand, for example, the fairness, interpretability and robustness of AI algorithms, these are all still open research questions. On the economic side of things, the estimated annual compliance costs will be between €1.6B and €3.3B by 2025.
  • Slide 170: The vast majority of data in the US remains unregulated. Some states have different regulation laws. But despite what some large tech companies may think, the US constitution doesn’t provide for the right to privacy. However, how many people care about this? I'm not sure. What does it mean to give away your data? I grew up with the internet, I kind of explicitly knew everything online or on a computer is online and probably accessible.
  • Slides 175 - 180: Military AI is stepping up. Israel claims use of an AI guided drone swarm in Gaza attacks. US air force build a an AI copilot called µZero based on DeepMind's work on games. Microsoft signs a $22B contract for Hololens to be used in military and orders 120,000 headsets as part of an Integrated Visual Augmentation System.

5. Predictions (Slide 183)

The prediction section includes ideas and forecasts from the authors of the report on what’s to come over the next year in the world of AI.

  1. Transformers replace recurrent networks to learn world models with RL agents surpass human performance in large and rich environments.
  2. ASML’s market cap reaches $500B.
  3. Anthropic publishes on the level of GPT, Dota, AlphaGo.
  4. A wave in consolidation in AI semiconductors with at least one chip company being acuqired by a large tech company.
  5. Small transformers + CNN hybrid models match current SOTA on ImageNet top-1 accuracy with 10x fewer parameters.
  6. DeepMind releases a major research breakthrough in the physical sciences.
  7. The JAX framework grows from 1% to 5% of monthly repos created as measured by PapersWithCode.
  8. A new AGI-focused company is formed with significant backing and a roadmap that’s focused on a sector vertical (e.g. developer tools, life science).

My predictions 🔮

I’ll add in a few of my own.

  1. HuggingFace becomes the defacto standard for hosting public models and datasets, the community will be able to build off these into their own apps.
  2. Chip companies start to realise the fragility of single supply chains, one large company decides to start $100B+ efforts locally.
  3. EleutherAI replicates GPT-3 scale model but more efficient and open source.
  4. As deep learning frameworks and model architectures mature, a new data-creation framework emerges for all things data: labeling, curation, verification, extraction. Though you could argue companies like scale.com already do this.
  5. A large study is published on different aspects of what improves a model: is it the data preparation? Is it the training routine? Is it the model architecture?

Resources 📃

All of the above points were either taken straight from or inspired by the following:

A big thank you to Nathan Benaich and Ian Hogarth for putting together the report this year and every other year.


See you next month!

If you made it this far, well done. I hope you enjoyed this special edition.

If you did, it would be amazing if you shared it with a friend 🙂

And as always, let me know if there's anything you think should be included in a future post.

In the meantime, keep learning, keep creating, keep dancing.

See you next month,

Daniel www.mrdbourke.com | YouTube

By the way, I'm a full-time instructor with Zero To Mastery Academy teaching people Machine Learning in the most efficient way possible. You can see a couple of our courses below or see all Zero To Mastery courses by visiting the courses page.

More from Zero To Mastery

ZTM Career Paths: Your Roadmap to a Successful Career in Tech preview
Popular
ZTM Career Paths: Your Roadmap to a Successful Career in Tech

Whether you’re a beginner or an experienced professional, figuring out the right next step in your career or changing careers altogether can be overwhelming. We created ZTM Career Paths to give you a clear step-by-step roadmap to a successful career.

Don’t be a Junior Developer: The Roadmap From Junior to Senior preview
Popular
Don’t be a Junior Developer: The Roadmap From Junior to Senior

Don’t sell yourself short. Seriously, don’t be a Junior Developer. This article outlines all the skills that you should learn if you want to get out of your Junior Developer role and get started on the path to becoming a Senior Developer.

Python Monthly Newsletter 💻🐍 preview
Python Monthly Newsletter 💻🐍

23rd issue of Andrei Neagoie's must-read monthly Python Newsletter: Twitch leak, removing the GIL, and 3.10 launch. All this and much more. Read the full newsletter to get up-to-date with everything you need to know from last month.