[December 2021] Machine Learning Monthly Newsletter 💻🤖

24th issue! If you missed them, you can read the previous issues of the Machine Learning Monthly newsletter here.

Daniel here, I'm 50% of the instructors behind Zero To Mastery's Machine Learning and Data Science Bootcamp course and our new TensorFlow for Deep Learning course! I also write regularly about machine learning and on my own blog as well as make videos on the topic on YouTube.

Welcome to this edition of Machine Learning Monthly. A 500ish (+/-1000ish, usually +) word post detailing some of the most interesting things on machine learning I've found in the last month.

What you missed in December as a Machine Learning Engineer…

My work 👇

New ZTM PyTorch course (coming soon) — I’m working on a hands-on introduction to PyTorch course. The code materials have been 95% completed and I’m going to start on the slides and videos within the next month or so. Stay tuned for release very early 2022. In the meantime, the GitHub repo will have almost daily updates.
Nutrify web app now does ~78 foods — My own personal machine learning project, Nutrify: take a photo of food and learn about it now classifies 78 different kinds of foods. It does so using a TensorFlow.js model right in the browser (no data ever leaves the device). The website is basic now but this is going to be my main project for 2022, by the end of the year, I want it to be able to detect and provide information for 1,000+ foods.
Setup M1 Pro, M1 Max or M1 laptop for machine learning (video and article) — If you’ve got a new M1 Pro or M1 Max MacBook Pro, chances are you want to get it setup for machine learning. I wrote an article and made a video to go along with it showing how to setup a TensorFlow environment capable of running machine learning code on your new Mac’s GPU (it works for M1 Macs too).
Learn slower (video and article) — I get asked a lot “how long will it take me to learn machine learning”, to which I often reply... that’s impossible to answer. Plus, if something’s worthwhile learning, what’s the rush? This article and video expand on my thoughts on learning however fast you can.

From the Internet 🕸

1. OpenAI’s GILDE (using text to generate images)

OpenAI released a new swag of models capable of generating images from text. For example, using a text prompt such as “a dinner plate with a steak, roast potatoes and mushroom sauce on it” the model generated the following:

openAI-generated-steak

Image generated by OpenAI’s GLIDE model.

Aside from generating images, there’s another side of GLIDE capable of inpainting (fill in a masked region of an image). You select a region of an image and the model fills in the region based on a text prompt.

GLIDES-inpainting

Example of OpenAI’s GLIDE inpainting various images.

The GLIDE model is trained on the same dataset as OpenAI’s previous DALL-E. Except the way images are created using text-conditioned diffusion (a diffusion model slowly adds random noise to a sample and then learns how to reverse it).

The paper found diffusion created images were more favourable than previous methods.

Try out the sample notebooks on the OpenAI GLIDE GitHub.

Aside: Even Nutrify (the ML project I’m currently working on) classifies the generated image from GLIDE as “Beef”, perhaps GLIDE type models will turn out to be a way to generate synthetic data.

Nutrify

Nutrify.app picking up on the “Beef” contained in the image generated by OpenAI’s GLIDE model. And yes, Nutrify looks plain now and currently only works for 78 foods but there’s plenty more to come. Stay tuned.

2. Machine learning used to detect early dementia incidence (2 years out)

In a study from JAMA (Journal of the American Medical Association) of 15,307 memory clinic participants, machine learning algorithms (such as Random Forest and XGBoost) were able to outperform previous detection models in 2-year dementia incidence diagnosis.

The machine learning models only need 6 variables (out of a total of 256) such as sex, hearing, weight and posture to achieve an accuracy of at least 90% and an area under the curve of 0.89.

My father has dementia and I know how helpful early diagnosis can be so this is incredible news to hear two of my worlds, machine learning and health, colliding.

3. A call to build machine learning models like open-source software by Colin Raffel

Colin Raffel is a faculty researcher at HuggingFace. And in his recent blog post, he calls for machine learning models to be built like open-source software.

Usually, models are trained by one (usually a large company) entity and then used in their services or made accessible through weight sharing (and used in transfer learning).

However, Colin paints the picture of building a machine learning model like open-source software is built, with potentially thousands of people contributing to a model from around the world just like large open-source projects (like TensorFlow or PyTorch) have hundreds of contributors.

For example, if a research facility had limited access to compute power, they could train a version 1.0 of a model and then share it with others who could update specific parts of the model and then those changes could be in turn verified before being incorporated back into the original model.

I love this idea because it doesn’t make sense to always be training large models from scratch.

4. HuggingFace acquires Gradio

One of my favourite machine learning companies joins one of my other favourite machine learning companies.

Mentioned back in the July 2021 edition of Machine Learning Monthly, Gradio is one of the simplest ways to demo your machine learning models.

gradio-demo-fix-res-crop

Using Gradio to create an interactive demo of a food recognition model. Notice the shareable link, these last for 24-hours when you first create the demo and can be used by others. See the example code used to make the demo on Google Colab.

Now Gradio will be incorporated directly into HuggingFace Spaces (a place to host interactive ML demos for free).

As a 2022 resolution, all of the models I build for 2022 will be deployed in some way using HuggingFace Spaces.

5. How the cloud ecosystem is going to change over the next 5-10 years (maybe)

Erik Bernhardsson is back again with another fantastic article discussing the future of cloud computing.

And since cloud computing is so vital to machine learning (many of the machine learning models I build are done so with cloud resources), I’ve included the article here.

He forecasts that large cloud vendors (like AWS, GCP and Azure) will continue to provide access to lower levels of compute and storage, instead of providing many different services on top.

And software vendors will build on top of these lower level pieces of hardware offering their own custom solutions.

Erik-Bernhardsson-cloud-ecosystem-future

The top row shows what’s currently available whereas the bottom row is what Erik predicts might change.

Again, these are predictions but Erik has a fair bit of skin in the game when it comes to building large-scale data services. After all, he did build the first recommendation engine at Spotify.

6. Using AI to bring children’s drawings to life

This is really cool.

I showed this one to my friend so he could show his kids (and himself) and they could watch their drawings come to life.

Researchers from Facebook AI (now Meta AI) developed a method (a series of four machine-learning based steps) to turn 2D humanoid-like drawings into living, dancing animations.

The blog post speaks of how the model(s) work but the live demo is where the real fun is at.

I tried it out with a drawing of my own, introducing G. Descent:

person-drawing

A drawing of G. Descent, a 2D smiling stick figure.

And with a few steps on the demo page, G. Descent turned into skipping G. Descent:

skipping-person-animation

From 2D drawing to skipping character. There are many more different types of animation you can try such as kickboxing, dancing, waving and hopscotch.

7. GLIP (Grounded Language-Image Pretraining) by Microsoft

If there’s a trend going on in the machine learning world right now it’s the combination of multi-modal data sources (data from more than one source), especially vision data (images) and language (captions, text, labels).

GLIP combines object detection and language awareness by training on a massive dataset of image-text pairs.

The use of the language alongside vision helps GLIP to achieve 1-shot (predicting after using only 1 training image) performance comparable with a fully-supervised Dynamic Head model.

GLIP trains an object detection model and language grounding model at the same time. Instead of traditional object detection model labels (e.g. one label per box like [dog, cat, mouse]), GLIP reformulates object detection as a grounding task by aligning each region/box to phrases in a text prompt.

GLIP

Example output of mapping detection regions in an image to a text-prompt.

8. Project Florence-VL

Again in line with this issue’s theme of combining text & image data, Microsoft has a new page dedicated to its goal for achieving multimodal intelligence (using multiple sources of data).

Project Florence is Microsoft’s new all encompassing vision framework able to take in image and text data to retrieve images, classify them, detect objects in them, answer visual questions and even detect actions in video.

Project-Florence

Microsoft’s Florence: A New Foundation for Computer Vision architecture. Source: Microsoft Research Page.

And Project Florence-VL (vision and language) collects all of Microsoft’s research in the vision and language space to help power Florence (including GLIP from above).

If you’re interested in how the future of combining vision and language looks, be sure to check out Microsoft’s Project Florence-VL research page, it’s a treasure trove of exciting research.

9. Why use the dot product in machine learning?

The dot product is one of the most used operations across many different deep learning architectures.

But why?

I recently discovered a terrific thread on StackExchange explaining (from multiple perspectives) why the dot product gets used so often in neural networks.

Tip: Find a topic tag on any of the StackExchange or StackOverflow websites, such as “neural networks” or “pandas” and filter the questions and answers for the most frequent or votes and you’ll see a plethora of problems people often run into.

Stack-Exchange

Example of filtering questions on StackExchange for questions tagged with “neural-networks” for most frequently visited. Source: StackExchange CrossValidated page.

10. What makes Transformers so powerful? SeaAI argue it’s the meta-architecture with MetaFormer and PoolFormer

This one astounded me.

The rise of transformers is well known. And one of the main components of the transformer model architecture is the attention mechanism.

However, how exactly the transformer architecture achieves such good results is much debated.

A recent paper from SeaAI argues that it’s the building blocks around the attention mechanism that give the transformer such performant capabilities and the “token mixer” (usually attention or a spatial MLP) can be substituted for other options and still get excellent results.

In fact, they substituted the attention mechanism with a non-parametric (no learning) pooling layer (yes, a pooling layer) in a vision transformer and achieved better or equal results to traditional transformer models with less compute.

They call the general architecture the MetaFormer (a transformer model with a specific token mixer layer) and their version of the MetaFormer the PoolFormer, where the token mixer layer is a pooling layer.

MetaFormer-architecture

MetaFormer architecture design layout compared to various other forms of the MetaFormer such as the traditional transformer (with attention) and the PoolFormer (with pooling as the token mixer). The results of the different architecture setups can be seen on the right with the PoolFormer achieving the best accuracy for the least compute. Source: PoolFormer: MetaFormer is Actually What You Need for Vision paper.

Not only is the paper fantastically written, they provide a series of ablation studies in the end comparing the architecture changes with different setups (such as changing GELU activation to RELU) and the code is all available on GitHub (I love seeing this!).

See you next month!

What a massive month for the ML world in December!

As always, let me know if there's anything you think should be included in a future post.

Liked something here? Tell a friend using those widgets on the left!

In the meantime, keep learning, keep creating, keep dancing.

See you next month, Daniel

www.mrdbourke.com | YouTube

By the way, I'm a full-time instructor with Zero To Mastery Academy teaching people Machine Learning in the most efficient way possible. You can see a couple of our courses below or see all Zero To Mastery courses by visiting the courses page.