28th issue! If you missed them, you can read the previous issues of the Machine Learning Monthly newsletter here.
Daniel here, I'm 50% of the instructors behind Zero To Mastery's Machine Learning and Data Science Bootcamp course and our new TensorFlow for Deep Learning course! I also write regularly about machine learning and on my own blog as well as make videos on the topic on YouTube.
Welcome to this edition of Machine Learning Monthly. A 500ish (+/-1000ish, usually +) word post detailing some of the most interesting things on machine learning I've found in the last month.
Eric Jang recently went on a job hunt for new machine learning roles.
He based his decision on a number of parameters.
The main one being: βwhich company has the largest technological lead towards artificial general intelligence (AGI)β.
And he landed on Halodi Robotics to build humanoid robots leading towards solving AGI in the next ~20 years.
Ericβs article sums up his journey and the different pros and cons of joining some of the leading AI companies like Facebook (Meta), Google, Tesla, Open AI and more.
One of my favourite quotes from the article is that eventually, every (tech) company will lead to becoming an AGI company.
Meaning, that eventually all of the roads for technology companies lead to AGI (Rome).
All roads for large technology companies lead to some form of AGI (whatever that is). Source: All Roads Lead to Rome: The Machine Learning Job Market in 2022 by Eric Jang.
I also liked his point that if you become CEO of a company (your own company), eventually you get taken off the tools, as in you spend more time managing company things rather than creating things.
Iβve seen friends move from engineering roles to managing roles and not be fans.
As for me?
Keep me on the tools.
Large language models (LLMs) are starting to change how computing is done.
With models like GPT-3, LLMs are starting to be integrated into all different kinds of services.
Instead of pressing certain buttons on different apps to do things, in the future itβs likely youβll use a natural-language based interface.
Weβre starting to see examples like this now.
Such as searching your photos app for βphotos of my dog last Aprilβ.
But this is only the beginning.
Russell Kaplanβs Tweet thread discusses how LLMs could be used as the new form of compute moat for large technology companies.
As in, youβll use an LLM as an API to build things and then youβll be at the mercy of large companies as to whether or not you can use that LLM.
But I see this as no different from what happens now.
Right now many people leverage the services of the internet to run their businesses.
Their services are provided by similar tech companies to the ones building the LLMs.
So alongside renting compute in the cloud to run your service, you could rent a LLM to generate information for your service.
Kaplanβs point on governments eventually needing to create their own supercomputer clusters as a form of national security is thought provoking too:
https://twitter.com/russelljkaplan/status/1513128014879490050?s=20&t=6oFSQF-Gm8Qk2YTsI1K-1A
As for LLM use cases, imagine being able to describe to Google Slides or Keynote what kind of diagram youβd like, βplease create a diagram with two arrows on the left and then make those arrows move around a circle in the middle...β
Or perhaps I just say βput together a list of 10 of the best resources in machine learning from the past monthβ... and include a βlittle spicy commentβ about each.
Iβve been creating a new ZTM PyTorch course for the last couple of months.
And Iβve found a bunch of resources for improving PyTorch code.
Jackβs article with tips and tricks is fantastic.
From loading data to model architectures to data operations to inference and validation, it covers almost everything youβll need.
Some of my favourites include:
num_workers
in your DataLoader
to a positive integer.pin_memory=True
in your DataLoader
to save time transferring memory from CPU to GPU.@autocast
decorator on your forward()
to use automatic mixed-precision (this uses lower precision computing where possible to save time on operations).For plenty more, see Jackβs article.
In the theme of large language models, what if every problem started to look like a language problem?
Pix2Seq is a new paradigm by Google AI to treat Object Detection as a sequence problem.
It does so by treating the labels (bounding box coordinates) for an image as a sequence of tokens, much like a language model might treat a sentence.
For example, a label for an image might look like:
[y_min, x_min, y_max, x_max, class_label]
Pix2Seq would see this sequence as βdescribingβ the object of interest.
The Pix2Seq workflow is to perceive an image and then generated a sequence of tokens (bounding box coordinates) for each object in the image. Source: Pix2Seq: A New Language Interface for Object Detection
This exciting new approach to object detection yields competitive or better results than previous methods using more complicated approaches.
You can never have enough Python skills.
And Iβve loved checking out the following resources for a bunch of tips and tricks:
args
and kwargs
as well as using FastAPI
to build web applications with Python. I plan on going through a fair few of these over the next couple of months.DALLEβ’2 was recently released by OpenAI (the blog post/paper but not the model/weights).
It uses a combination of diffusion (learning to return an image to its original form after its being corrupted with noise) as well as CLIP (a combination of image and language pairs) to do some wild things with images.
It can generate images given a text-prompt with an astonishing level of detail:
Using a text based caption to create an image, mixing old with new... I donβt think many 16th century paintings have cell phones in them. Source: Eric Jang + DALLβ’E.
But thatβs not all...
It can edit images based on language prompts.
And it can create new versions of the same image, such as viewing a painting in several different styles.
This Tweet from David Schnurr shows some more DALLEβ’2 creations
To learn more about DALLEβ’2, Iβd recommend the following:
Self-supervised learning (SSL) allows a model to learn a representation of data directly from the data itself.
One of the main methods of doing this is by masking out portions of the data and getting a model to fill in the blanks.
Another method is by using a teacher and student model, known as self-distillation or knowledge distillation. In this setup, a teacher model learns one representation of the image and the student tries to deconstruct the representation.
Meta AI has created a self-supervised learning demo for images to explore different self-supervised concepts.
On the demo, you can upload an image and have a self-supervised learning model DINO (self-distillation with no labels) segment the image or retrieve similar patches of the image with other images or match the image to images that are similar to it across a dataset of 5 million images.
Example of using Meta AIβs SSL demo to match an image to other images with similar scenarios happening. Source: Custom image on Meta AIβs SSL demo page.
Many tricks have come out in the past few years to train computer vision models to state-of-the-art (SOTA) results.
Methods such as cutmix, mixup, learning rate decay, weight decay and many more have contributed to improved results over time.
But often with every new model came a new way specific way of training that model.
Having a unique training method for every new model is as tedious as it sounds.
But a new Unified Training Scheme for ImageNet (USI) from the Alibaba research group shows a way to produce SOTA results across a wide range of model architectures (transformers, CNNs, mobile-based architectures) using the same training methodology.
USI uses a knowledge distillation (KD) layer via a student and teacher method with several well-known computer vision training techniques.
The KD layer includes the prediction probabilities from a teacher model as well as the ground truth labels from the data as input to a student model.
This combination leads to more robust learning from the student model than just pure ground truth labels.
This method is not too dissimilar to some of the training methods recently seen in SOTA self-supervised learning architectures.
The universal training scheme for ImageNet leverages knowledge distillation (KD) by combining a teacher and student network to form the final predictions. Source: Solving ImageNet: a Unified Scheme for Training any Backbone to Top Results paper.
Having a universal training scheme allows many researchers to get started with a SOTA training method and know theyβre getting excellent results out of the box.
With the increase in generative models comes the increased chance of generating synthetic data to train new and existing models.
But should we use generated (synthetic) data to train new and existing models?
Vikash Sehwagβs latest article argues yes we should because it can (if the generated is of high enough quality):
Iβm rereading the classic post The Bitter Lesson by Rich Sutton (first mentioned in Machine Learning Monthly November 2020).
Itβs a good reminder with the rise of all the new kinds of models coming out.
The bitter lesson being: in the past 70 years of AI research general methods that leverage computation are ultimately the most effective and by a large margin.
Meaning many hand-crafted methods for AI over the past few decades have been superseded by methods that leverage randomness the best.
And the ones that leverage randomness the best are usually the ones that leverage compute power the best (because the more compute power you have, the more randomness you can simulate).
This is where methods like self-supervised learning (learning from the data instead of labels) and knowledge distillation (again learning more from the data as well as labels) come into play.
My takeaway from this is: keep your learning method as general as possible and learn from the data itself (a compressed version of reality) as much as possible.
The more you try to handcraft specific things, the more bitter things will taste when a more general method comes along.
What a massive month for the ML world in April!
As always, let me know if there's anything you think should be included in a future post.
In the meantime, keep learning, keep creating, keep dancing.
See you next month, Daniel
By the way, I'm a full-time instructor with Zero To Mastery Academy teaching people Machine Learning in the most efficient way possible. You can see a couple of our courses below or check out all Zero To Mastery courses.