Machine Learning Monthly (January 2020)

Hey everyone! Daniel here. I write regularly about machine learning and on my own blog as well as videos on the topic on YouTube. I'll be writing this monthly newsletter here as well!

1st issue! That's right, you're right at the beginning of this journey! If there is enough interest, I will keep doing these every month so please share it with your friends!

If it’s your first time here… (otherwise skip this part)

Being a Machine Learning Engineer is a fantastic career option and Machine Learning is now one of the fastest growing job markets (including Data Science). Job opportunities are plentiful, you can work around the world, and you get to solve hard problems. However, it’s hard staying up to date with the ever-evolving ecosystem.

This is where this newsletter comes in. Every month, it’ll contain some of my favourite things from the industry, keeping you up to date and helping you stay sharp without wasting your time.

What you missed in January as a Machine Learning Engineer…

2019 was a massive year for machine learning for Google (unsurprisingly!)

Jeff Dean, SVP of Google Research and Health posted a great summary of Google's machine learning accomplishments for 2019. Some of my favourites were:

The progress in federated learning (using plenty of smaller computers, like smartphones to train machine learning models rather than giant data centres). This is important because it starts to open up the capabilities of smaller companies having access to being able to improve their machine learning systems without having Google-scale computer power.
2020 also seems to be the year for health. This is still a tough one. Since most machine learning experiments are done in controlled environments and biological systems are anything but controlled (yet). But what you can be assured is Google is spending big trying to see where machine learning can help improve healthcare.

At the end of the article, Dean also lays out some of Google's visions for machine learning going forward. Health gets another mention here. "How can we apply computation and machine learning to make advances in important new areas of science?"... such as healthcare and bioinformatics.

Learning machine learning without math

One of the questions I get most often is "how can I learn the math behind machine learning?". The canned response is to say something like, go and study linear algebra, calculus, probability, statistics, computer science. But that isn't really helpful. As one could spend years on each of these and still not know enough (how much is enough?).

Jason Brownlee from Machine Learning Mastery lays out a far more practical approach. One based on curiosity rather than logic. Trying to learn all of the above topics at once is like trying to boil the ocean. Instead of boiling the ocean, Brownlee advocates for starting with trying to boil a kettle first.

Choose a project to work on, something which interests you. And see if you can apply machine learning to that project. When you get some small wins, solve a few problems, you'll have no choice but to want to dive deeper. And then you can take advantage of this curiosity to dive deeper to fuel your further understanding of the math.

This is the approach I take. Learning what you need to learn when you need to learn it.

Life is multi-variable, not single-variable

When getting started, it's common practice to apply machine learning to a single source of input data. Such as an algorithm looking at a single image and deciding whether or not there is a car in there.

But as Andrei Karpathy, head of AI (artificial intelligence) at Telsa, explains in a recent talk, self-driving cars are very much a multi-task problem. This means, rather than a single image input and a decision being made, a Tesla takes in information from 8 different images, stitches them together and then makes a decision based on the collective.

This, of course, is harder to do than from a single input. But is necessary for a domain such as self-driving cars.

You can imagine this is also the case in many other domains. We make decisions based on information from many input sources. As an example, imagine a doctor trying to prescribe a treatment based only on your age and nothing else. How well would it go?

The 10 (more like ~50) commandments of machine learning

Another gem from Google. In their Best Practices for ML Engineering guidelines, they outline a series of heuristics one can use for approaching a potential machine learning project.

My favourite is #1.

"Don't be afraid to launch a product without machine learning."

As powerful as machine learning is, if a simple rule-based system, one which gets the job done, can be used, it should be.

When accuracy doesn't cut it

The standard metric for evaluating classification models is accuracy.

But let's see where that fails.

Let's say 100,000 people board planes every day at airport X. And 1 of them has a disease. If the person with the disease gets on the plane, this could be problematic for the people on board.

So airport X is tasked with building a machine learning classifier to figure out who has the disease based on an eyeball scan at the terminal (remember, this is made up).

A machine learning model which predicted "no disease" for every single person would have an accuracy of 99.999%. Look at all those 9's! Not bad!

But now you start to realise where accuracy comes undone. Damien Martin discusses two better-suited metrics to this problem, precision and recall in an article which may also help you in a future interview. Give the example questions a try, they tripped me up.

Phew! 2020 is almost 10% over already and there's one thing for sure. There's plenty going on.

Stay playful, keep learning.

PS. If you have a suggestion you'd like to see in a future edition or some of your own work you'd like to share, let us know. See you next month!

By the way, I'm a full time instructor with Zero To Mastery Academy teaching people Machine Learning in the most efficient way possible. You can see a couple of our courses below or see all Zero To Mastery courses by visiting the courses page.

Machine Learning Monthly 💻🤖

Daniel Bourke