Machine Learning Monthly (August 2020)

8th issue! If you missed them, you can read the previous issues of the Machine Learning Monthly newsletter here.

Hey everyone, Daniel here, I'm 50% of the instructors behind the Complete Machine Learning and Data Science: Zero to Mastery course. I also write regularly about machine learning and on my own blog as well as make videos on the topic on YouTube.

Welcome to the eighth edition of machine learning monthly. A 500ish (+/-1000ish, usually +) word post detailing some of the most interesting things on machine learning I've found in the last month.

Since there's a lot going on, the utmost care has been taken to keep things to the point.

What you missed in August as a Machine Learning Engineer…

My work. Your benefit. 👇

[Article] How I'd start learning machine learning again (3-years in) — Three years ago I started studying machine learning in my bedroom. I know a lot more than when I started but one thing's still clear: how much there still is to learn. This article dives into a few of the things I'd change if I was starting again.
There is now a video version of this article you can check out!

The Best From The Internet 🕸

1. Visual neural networks by vcubingx

What it is: A YouTube video series demonstrating the ins and outs of a neural network (chapter 1 is out now, more to come).

Why it matters: I'm always inspired by someone being able to explain a hard concept in an understandable way. If you've ever wanted a visual understanding of what happens when you write neural network code, check out this series.

2. Using Machine Learning to Detect Deficient Coverage in Colonoscopy Screenings

What it is: Google built a machine learning-based algorithm called C2D2 (Colonoscopy Coverage Deficiency via Depth) to see whether or not a colonoscopy scan covered sufficient portions of the colon.

Why it matters: Colorectal cancer (CRC) results in 900K deaths in the US per year. While it can prevented by removing potentially harmful polyps discovered in the colon through colonoscopy, many of them (up to 22-28%) get missed. A quarter of these that are missed have the potential to become cancerous. The C2D2 algorithm out performs physicians by 2.4x on synthetic videos. And 93% of experts agree with the algorithms scores when performing on real videos. Being able to better detect polyps better and earlier could reduce the rate of interval CRC (being diagnosed with CRC within 5-years of a previous diagnosis).

3. Want to learn Deep Learning? Check out these fastai2 announcements

What it is:

A new book — Deep Learning for Coders with fastai and PyTorch.
A new course — Practical Deep Learning for Coders (covers deep learning and machine learning with fastai2 library).
A new library — fastai v2: a complete rewrite of the original fastai to be faster, easier and more flexible.

Why it matters: Whenever someone asks me how they should start learning deep learning, I recommend fastai. And the past two weeks have been the biggest in its history, releasing a new book, course and library all together.

I remember in my last machine learning engineer role, I watched a fastai lecture, implemented the technique Jeremy (founder of fastai) talked about and blew the previous results I'd got on a problem I'd been working on for a week out of the water, all in a few hours.

I've ordered the book and I'll be going through the course in the Summer (Australian Summer 😉: Dec-Jan).

Bonuses:

Podcast with fastai founder Jeremy Howard and Weights & Biases CEO Lukas Biewald about the story of fastai and the future of ML.
Zero to Hero with fastai series by Zachary Mueller: A ground-up approach to using the fastai library and how to use it for your own problems.

4. minGPT by Andrei Karpathy

What it is: A miniature version of the GPT architecture which powers GPT2 and GPT3 (the natural language processing model currently taking the internet by storm, see last month's issue of ML Monthly for more).

Why it matters: Sometimes seeing state of the art architectures and models being released by large companies can be intimidating to those who are still learning the field. In minGPT, Andrei Karpathy shows you how the GPT architecture is implemented and trained using pure PyTorch. The cover photo makes the repo.

minGPT repo cover photo

5. Less supervised computer vision 🥽

What it is: A few recent resources about using less labelled data for deep learning.

Tutorial on weakly-supervised learning in computer vision from European Conference on Computer Vision (ECCV) 2020.
Learning with limited labels tutorial from Nvidia at ECCV 2020.
Deep Learning with Limited Labelled Data seminar by Colin Raffel.

Why it matters: For the past few years, breakthroughs in AI have come from having access to large databases of labelled data. However, these databases take large amounts of time and resources to create. So it makes sense that there's a lot of interest in research to find ways around requiring these large amounts of labelled data to get desirable results. A few years ago, unsupervised methods would've been laughed out of the room. Now they're starting to compete with some of the best supervised methods, especially in NLP (GPT3 is totally unsupervised) and it seems computer vision is next.

6. MLOps tutorial series

What it is: MLOps is the combination of DevOps and Machine Learning. This tutorial series will show you how to bring one of the best DevOps practices, continuous integration, to your machine learning projects.

Why it matters: Machine learning is more than building models in notebooks. In fact, many of the most important steps in machine learning happen completely out of notebooks. If you want to get your models into the hands of others, you'll want to start picking up MLOps skills.

7. How to deploy and host a machine learning model with FastAPI and Heroku

What it is: A step by step guide to deploying a stock price predicting model API using FastAPI (a Python web framework) and Heroku (a cloud computing platform).

Why it matters: Don't let your machine learning models die in a Jupyter Notebook, get them live and see what happens when you and others interact with them.

8. On-device supermarket product detection

What it is: Using a computer vision powered app to learn more about supermarket products.

Why it matters: Food labels can be confusing at the best of times. Even more so, what if you couldn't see the food labels at all? Google's Lookout app uses computer vision to help solve these problems using on-device machine learning models (important because of poor connection in supermarkets) to help a user understand what they're looking at, displaying information such as nutritional facts, allergens and more.

9. New Nvidia 30 series chips

What it is: Nvidia announced their latest consumer graphics cards, the 30 series with dramatic improvements over the previous generation. Due for release from September 17th onwards.

Why it matters: If you're looking into building your own deep learning PC, you're probably going to use Nvidia GPUs. The new 30 series is based on Nvidia's Amphere architecture. In short, it means you're going to be getting more compute power for the money you pay. This is exciting because if you can run quality experiments on a local GPU (e.g. one of Nvidia's 30 series), scaling them up when necessary (e.g. to many GPUs on the cloud) is very easy to do.

See you next month!

As usual, a massive month for the ML world in August.

As always, let me know if there's anything you think should be included in a future post. Liked something here? Send us a tweet.

In the meantime, keep learning, keep creating.

See you next month,

Daniel www.mrdbourke.com | YouTube

By the way, I'm a full time instructor with Zero To Mastery Academy teaching people Machine Learning in the most efficient way possible. You can see a couple of our courses below or see all Zero To Mastery courses by visiting the courses page.