[January 2022] Machine Learning Monthly Newsletter 💻🤖

25th issue! If you missed them, you can read the previous issues of the Machine Learning Monthly newsletter here.

Daniel here, I'm 50% of the instructors behind Zero To Mastery's Machine Learning and Data Science Bootcamp course and our new TensorFlow for Deep Learning course! I also write regularly about machine learning and on my own blog as well as make videos on the topic on YouTube.

Welcome to this edition of Machine Learning Monthly. A 500ish (+/-1000ish, usually +) word post detailing some of the most interesting things on machine learning I've found in the last month.

What you missed in January as a Machine Learning Engineer…

My work 👇

New ZTM PyTorch course (coming soon) — I’m working on a hands-on introduction to PyTorch course. The skeleton code materials have been 95% completed. I’m currently annotating each notebook with text descriptions and will be working on slides, images and videos for the first three modules during February. In the meantime, the GitHub repo will have almost daily updates.
Nutrify web app now does 100 foods (up from 78 last month) — My own personal machine learning project, Nutrify: take a photo of food and learn about it now classifies 78 100 different kinds of foods. It does so using a TensorFlow.js model right in the browser (no data ever leaves the device). The website is basic now (still) but this is my main project for 2022. By the end of the year, I want it to be able to detect and provide information for 1,000+ foods. The next step is to display nutrition information for the current 100 foods. I’m currently learning more JavaScript to do so.

From the internet 🕸

Alrighty, let’s get started. There’s a fair bit going on for the first month of the year.

The theme of the month is: building your own projects.

Why?

Because that’s where the real learning happens, courses teach foundation knowledge but projects help you learn specific knowledge (knowledge that can’t be taught). And there’s plenty of resources and ideas in this issue to play with.

Blog posts 📫

1. Google’s AI work and themes from 2021 (a big summary)

Google AI released a huuuuuuge blog post detailing a bunch of their research highlights for 2021 with a few major themes such as:

More capable, general-purpose ML models
Continued efficiency improvements for ML
ML is becoming more personally and communally beneficial
Growing benefits of ML in science, health and sustainability
Deep and broader understanding of ML

There’s too much to go through here but one of my favourite sections was the Datasets section towards the end.

Datasets - Google AI 2021

A small subset of the new open-source datasets (and tools) published by Google AI in 2021.

What an absolute treasure trove of potential. You could take one of the datasets here and turn it into a small project to practice your skills.

Such as the Readability Scroll dataset, a dataset that measures how readable a piece of text is. Could you build a model to analyze a blog post (or any piece of writing) and tell you how “readable” it is? It looks like these folks are doing just that.

Who knows, potentially you could build a whole start up based on what you learn and build from these datasets. Don’t forget Google’s Dataset Search, a search engine specifically designed for looking for datasets.

2. 5 use cases for AI to help fight world hunger

I love food (you may have noticed this). And I’m lucky enough to have access to high quality foods.

But not everyone does.

Feeding a world of close to 8 billion people is a tough problem.

Can AI help?

The World Food Programme (WFP) seems to think it can.

I stumbled upon the WFP Frontier Innovations Programme during the month and found a plethora of incredible things happening in the space of AI and helping hunger:

Optimus is a planning system that takes into account population size, transport routes and nutritional value of food to help decide the optimal plan of where resources should go.
HungerMap (Live) uses analytics and visualization to provide a live world map of where food security problems may be occurring.

HungerMap (Live)

HungerMap (Live) showing a world map of different levels of food security around the world.

SKAI uses computer vision to analyze satellite images to see when and where natural disasters might have occurred, this helps determine what kind of aid might be required.
Voice to Text AI has been used to conduct nutrition surveys remotely, a pilot project has been investing in fine-tuning an existing model to perform better on less common languages such as Amharic or Somali.
MEZA is a proof of concept attempting to digitize paper nutrition records of malnourished children from remote health clinics around the world.

3. How to Keep Learning Machine Learning by Eugene Yan

Eugene is easily one of my favourite writers in the space of data science and ML.

And his latest post provides answers to the kinds of questions I get quite often: “I’ve finished course XYZ, what should I do next?”

To which I usually respond... start working on something fun of your own.

Which is exactly Eugene’s second point.

Do a personal project that stretches you. It’s a practical way to gain hands-on experience outside of work. It could be learning a new programming language, trying a new framework, or building an app. Pick a project that aligns with your interests. This makes it more fun and thus more likely to finish. Also, good projects are scary — if you’re not worried about failing, it’s probably not challenging enough to provide much learning. Personally, I’m motivated to finish projects where I can help others, learn something new, and have fun along the way.

This can be scary to begin with.

Because no one has laid everything out for you.

But that’s exactly why it’s a good learning experience.

You’re forced to figure things out by trial and error.

Stuck for ideas?

Copy something you like. Dig into how it works and then reproduce it yourself. Put your own twist on it.

I’m doing it now with Nutrify. A lot of what’s required for the app I’d like, I have no idea how to do. But I’m learning. I’m learning about databases, web development, model versioning, data versioning. All bit-by-bit.

Start small, then let it grow as you learn more.

4. Can A.I. help with heartbreak? A visual essay by Pamela Mishkin

If you haven’t already, do yourself a favour and go and check out pudding.cool.

I’ve got no idea where the domain name came from but it’s right. It’s very cool.

The website itself is a collection of data, storytelling, beautiful visuals, satire and more.

But the article in focus is Nothing Breaks Like A.I. Heart.

And I don’t really know how to describe it. That’s how you know it’s good.

It’s part written by GPT-3 / part written by you (the reader, yes you can decide where the story goes) / part written by Pamela... or mostly written by Pamela (I think?!).

It dives into how GPT-3 can be used to generate text but sometimes that text isn’t good.

But much of the time human-created text isn’t good either.

It also incorporates wheels (yes wheels) you can adjust to drive the story in different directions.

story-wheels-moving Screen Recording

Using interactive wheels the narrative changes depending on the answer you select. Sometimes it’s a dead end, sometimes it’s not.

Ultimately the essay asks the question if you’re struggling to find answers after a recent heartbreak, can an A.I. system trained on an internet worth of text offer any closure?

Papers & code 📰👩‍💻

I’m loving the research coming out at the moment. It seems the field is really hitting its stride in terms of combining battle-tested methods.

Now instead of just one modality (e.g. vision only), it's becoming more and more useful to combine more than one (e.g. vision and audio).

A few papers caught my eye from Meta AI (Facebook AI):

Speech recognition using self-supervised audio and vision (AV-HuBERT) drastically improves state of the art, even in noisy environments. This makes a lot of sense. Because how many times have you misheard someone but comprehended what they said because you saw their face?

AV-HuBERT in noisy environments

AV-HuBERT improves drastically upon previous methods in noisy environments, even when using 10x less data.

ConvNeXt is an upgraded computer vision model taking convolutional neural nets (CNNs) to the NeXt level. Vision Transformers seemed to be creeping up on throwing CNNs off the computer vision throne the past couple of years but not so fast. Turns out that by using a bag of training tricks, many of which were highlighted in the ResNet Strikes Back paper and a few architectural design changes, ConvNeXt achieves very favourable results across many computer vision benchmarks.
Data2vec is a unified self-supervised learning paradigm across vision, language (text) and speech. By leveraging masking, a technique originally used for NLP models such as BERT and a teacher and student model (teacher sees the full data, student tries to replicate what the teacher does), data2vec unlocks a new state of the art for self-supervised learning across three modalities at the same time. It’s like the teacher model is playing hide and seek with the student model. I loved the idea so much I did a (rough and live) paper walkthrough on Twitch and then reposted it to YouTube.
Training Vision Transformers with Only 2040 Images. We all know how much data deep learning models take. But what if you don’t have the amounts of data large technology companies have access to? Or you work in a field where data is already rare (such as the medical field). This paper explores how vision transformers can perform using very small amounts of data. As it turns out, pretty good.

Bonus: Now this is a big bonus. ALL of the papers above have code available. Another trend I’m loving, more and more research is getting released with code attached and some even have live demos!

Eye-gazing focus as a biomarker of mental fatigue. I stumbled on this paper while reading through Google’s AI summary for 2021. Turns out that tracking the eye focus of someone (where the eye is looking) over time can be a pretty good judge of how focused someone is. And of course we all know this but now there’s a model that tells us that it may be true. Using a few minutes of gaze data (looking at a smartphone front-facing camera) the model was able to predict whether or not a person was mentally fatigued (mental fatigue was measured by answers to several questions). A potential app idea here? A desktop-based webcam monitor (running on device of course, for privacy) to let someone know when their focus is off and let them know that perhaps a break is best.

Podcasts & videos 🔉📺

One of the all time greats of AI joined one of the all time greats of podcasts. **Lex Friedman interviewed Yann Lecun** (the founder of CNNs) about all things life, AI, future, problem solving and more. I particularly loved the discussions on self-supervised learning. Yann even hinted at the upcoming research mentioned above (data2vec).
DeepMind’s podcast is back for a second season. Breaking down some of the latest and greatest research in the world of AI by speaking directly with researchers about their work. There’s a couple of episodes live already about AlphaFold (a machine learning model that predicts the folding of proteins) and another on the path to AGI.

Tools 🔨

DAGsHub is the GitHub of machine learning and data science projects. Git is good for tracking code but it’s not too good for larger files like models or data, that’s where DVC (data version control) comes in. DVC is an open-source tool like Git but its specifically targeted for helping to track data changes and model changes. And Dagshub (or DAGsHub) is to DVC like what GitHub is to Git. They recently just launched their 2.0 version with a bunch of awesome upgrades including a collaboration with the open-source Label Studio. I’ve only just heard about it but it looks like it’s already an outstanding tool that’s only going to get better.
OpenAI’s Text and Code Embeddings API is live. Got text? Want to get them turned into numbers by one of the most powerful natural language models in existence? Well, OpenAI has the tool just for you. Their GPT-3 (GPT-3 is a very large natural language and code neural network) text and code embeddings API just went live. This means you can now use a version of GPT-3 to encode your text (or code) into embeddings that numerically represent their semantic meaning. The blog post goes through quite a few use cases. This screams project use cases to me! How about an app that uses the Open API to classify different book highlights into different categories? Or got a messy Notion document? How could the OpenAI API and the Notion API help to sort it out?

Word embeddings - OpenAI

Similiar word embeddings get grouped together in numerical space, even down to the level of nuance such as “canine companions say” being closer to “woof” than to “meow”.

See you next month!

What a massive month for the ML world in January!

As always, let me know if there's anything you think should be included in a future post.

Liked something here? Tell a friend using those widgets on the left!

In the meantime, keep learning, keep creating, keep dancing.

See you next month, Daniel

www.mrdbourke.com | YouTube

By the way, I'm a full-time instructor with Zero To Mastery Academy teaching people Machine Learning in the most efficient way possible. You can see a couple of our courses below or see all Zero To Mastery courses by visiting the courses page.