Welcome to Part 2 in my brand new 3-Part series on Tensorflow and Deep Learning.
Sidenote: Technically this 'mini series' is part of my larger 'Introduction to Machine Learning' series, but I went so deep on this particular section, I needed to make it into 3 parts!
Be sure to check out the other parts in this TensorFlow series, as they all lead into each other:
So a quick recap of the series so far.
The goal of this series is to give you an overview of deep learning (and more specifically, transfer learning) when using Tensorflow and Keras.
Even better still?
Rather than just tell you what this all means, I’m also going to walk you through a project that you can follow along with, so you can learn as you go.
The project we’re going to build is called ‘Dog Vision’. It’s a neural network capable of identifying different dog breeds via images.
In the first part of the series, we took the time to set up the project, get our data, explore it, create a training set, and then turn our data into a Tensorflow dataset.
I highly recommend you read Part 1 first if you haven’t already, so that you understand what’s happening.
In this new part of the series (this article you’re reading right now), we’ll take that dataset that we created in Part 1, and use it to build a neural network, train the model, and then fit the model on the data.
Then, in the third and final part of this series, we’ll evaluate our model, make predictions, and work through the deployment phases, which are crucial for understanding how to assess and utilize the trained models effectively.
So as you can see, we’re going to work all the way from dataset preparation to model building, training, and evaluation, so this is a complete project walkthrough from start to finish.
Not only will it help you to understand Tensorflow better, but you’ll have hands-on experience and a project for your portfolio by the end of it!
My name is Daniel Bourke, and I'm the resident Machine Learning instructor here at Zero To Mastery.
Originally self-taught, I worked for one of Australia's fastest-growing artificial intelligence agencies, Max Kelsen, and have worked on Machine Learning and data problems across a wide range of industries including healthcare, eCommerce, finance, retail, and more.
I'm also the author of Machine Learning Monthly, write my own blog on my experiments in ML, and run my own YouTube channel - which has hit over 8 Million views.
Sidenote: If you want to deep dive into Machine Learning and learn how to use these tools even further, then check out my complete Machine Learning and Data Science course or watch the first few videos for free.
It’s one of the most popular, highly rated Machine Learning and Data Science bootcamps online, as well as the most modern and up-to-date. Guaranteed.
You'll go from a complete beginner with no prior experience to getting hired as a Machine Learning Engineer this year, so it’s helpful for ML Engineers of all experience levels.
Want a sample of the course? Well, check out the video below:
If you already have a good grasp of Machine Learning, and just want to focus on Tensorflow for Deep Learning, I have a course on that also that you can check out here.
With that out of the way, let’s get into this guide.
Neural networks are one of the most flexible and customizable ‘deep learning’ machine learning models available, and you can create a neural network to fit almost any kind of data.
In fact, the "deep" in deep learning refers to the many layers that can be contained inside a neural network.
A neural network will often follows the structure of:
Input layer -> Middle layers -> Output layer.
The main premise is that data goes in one end, gets manipulated by many small functions in an attempt to learn patterns/weights which represent the data to produce useful outputs.
That's an excellent question, and it’s hard to answer fully because there are so many different options.
However, for the interest of keeping this guide fairly simple, we’re going to focus on two of the most popular modern kinds of neural networks:
Because our problem is in the computer space, we're going to use a CNN.
But instead of crafting our own CNN from scratch, we're going to take an existing CNN model and apply it to our own problem, by using a method called ‘transfer learning’.
Transfer learning is the process of getting an existing working model and adjusting it to your own problem. This means you can get better results in less time with less data, without having to build something from scratch.
An existing model may have the following features:
You may be thinking, ok this all sounds incredible, so where can I get pretrained models?
Well the good news is, there are plenty of places to find pretrained models!
How do you choose which to use?
Well, for most new machine learning problems, if you're looking to get good results quickly, you should generally look for a pretrained model similar to your problem and use transfer learning to adapt it to your own domain.
With that in mind, and since we're focused on TensorFlow/Keras, we're going to be using a pretrained model from here tf.keras.applications
.
More specifically, we're going to take the tf.keras.applications.efficientnet_v2.EfficientNetV2B0()
model from the 2021 machine learning paper EfficientNetV2: Smaller Models and Faster Training from Google Research and apply it to our own problem.
This model has been trained on ImageNet1k (1M+ images across 1000 different diverse classes, there is a version called ImageNet22k with 14M+ images across 22,000 categories) so it has a good baseline understanding of patterns in images across a wide domain.
ImageNet is also the same location where we got our images for the dataset in part 1 so it’s a win:win situation.
Let’s see if we can adjust those patterns slightly to our dog images.
To do this, we’ll create an instance of it and call it base_model
.
Input:
# Create the input shape to our model
INPUT_SHAPE = (*IMG_SIZE, 3)
base_model = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(
include_top=True, # do want to include the top layer? (ImageNet has 1000 classes, so the top layer is formulated for this, we want to create our own top layer)
include_preprocessing=True, # do we want the network to preprocess our data into the right format for us? (yes)
weights="imagenet", # do we want the network to come with pretrained weights? (yes)
input_shape=INPUT_SHAPE # what is the input shape of our data we're going to pass to the network? (224, 224, 3) -> (height, width, colour_channels)
)
Output:
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/efficientnet_v2/efficientnetv2-b0.h5
29403144/29403144 [==============================] - 0s 0us/step
And that’s our base model created! Easy right?
We can find out information about our base model by calling base_model.summary()
.
Input:
# Note: Uncomment to see full output
# base_model.summary()
Here’s a truncated output of base_model.summary()
:
Woah! Look at all those layers... this is what the "deep" in deep learning means! A deep number of layers.
How about we count the number of layers?
Input:
# Count the number of layers
print(f"Number of layers in base_model: {len(base_model.layers)}")
Output:
Number of layers in base_model: 273
273 layers!
Wow, there's a lot going on.
Rather than step through each layer and explain what's happening in each layer, I'll leave that for the curious mind to research on their own.
Just know that when starting out deep learning you don't need to know what's happening in every layer in a model to be able to use a model.
For now, let's pay attention to a few things:
So let’s break these down.
One of the most important practical steps in using a deep learning model is input and output shapes.
Two questions to ask:
We ask about shapes because in all deep learning models input and output data comes in the form of tensors.
This goes for text, audio, images and more.
The raw data gets converted to a numerical representation first before being passed to a model.
For example
In our case, our input data has the shape of [(32, 224, 224, 3)]
or [(batch_size, height, width, colour_channels)]
.
And our ideal output shape will be [(32, 120)]
or [(batch_size, number_of_dog_classes)
.
Your input and output shapes will differ depending on the problem and data you're working with. But as you get deeper into the world of machine learning (and deep learning), you'll find input and output shapes are one of the most common errors.
We can check our model's input and output shapes with the .input_shape
and .output_shape
attributes.
Input:
# Check the input shape of our model
base_model.input_shape
Output:
(None, 224, 224, 3)
Nice! It looks like our model's input shape is where we want it.
Remember None
in this case is equivalent to a wild card dimension, meaning it could be any value, but we've set ours to 32
.
This is because the model we chose, tf.keras.applications.efficientnet_v2.EfficientNetV2B0
, has been trained on images the same size as our images.
If our model had a different input shape, we'd have to make sure we processed our images to be the same shape.
So now let's check the output shape.
Input:
# Check the model's output shape
base_model.output_shape
Output:
(None, 1000)
Hmm, is this what we're after?
No, not really. You see, since we have 120 dog classes in our dataset, we'd ideally like an output shape of (None, 120)
.
So then why is it by default (None, 1000)
?
Well, this is because the model has been trained already on ImageNet, a dataset of 1,000,000+ images with 1000 classes (hence the 1000
in the output shape).
So how can we change this?
Well, let’s recreate a base_model
instance, except this time we'll change the classes
parameter to 120, like so:
Input:
# Create a base model with 120 output classes
base_model = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(
include_top=True,
include_preprocessing=True,
weights="imagenet",
input_shape=INPUT_SHAPE,
classes=len(dog_names)
)
base_model.output_shape
Output:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-62-5e9b29e6f858> in <cell line: 2>()
1 # Create a base model with 120 output classes
----> 2 base_model = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(
3 include_top=True,
4 include_preprocessing=True,
5 weights="imagenet",
/usr/local/lib/python3.10/dist-packages/keras/src/applications/efficientnet_v2.py in EfficientNetV2B0(include_top, weights, input_tensor, input_shape, pooling, classes, classifier_activation, include_preprocessing)
1128 include_preprocessing=True,
1129 ):
-> 1130 return EfficientNetV2(
1131 width_coefficient=1.0,
1132 depth_coefficient=1.0,
/usr/local/lib/python3.10/dist-packages/keras/src/applications/efficientnet_v2.py in EfficientNetV2(width_coefficient, depth_coefficient, default_size, dropout_rate, drop_connect_rate, depth_divisor, min_depth, bn_momentum, activation, blocks_args, model_name, include_top, weights, input_tensor, input_shape, pooling, classes, classifier_activation, include_preprocessing)
932
933 if weights == "imagenet" and include_top and classes != 1000:
--> 934 raise ValueError(
935 "If using `weights` as `'imagenet'` with `include_top`"
936 " as true, `classes` should be 1000"
ValueError: If using `weights` as `'imagenet'` with `include_top` as true, `classes` should be 1000Received: classes=120
Oh no, we get an error!
If we look closer at the error, we’ll see this section:
ValueError: If using weights as 'imagenet' with include_top as true, classes should be 1000 Received: classes=120
So what does this mean?
Well, what this is saying is that if we want to keep using the pretrained 'imagenet'
weights (which we do so that we can leverage the visual patterns/features that the model has already learned on ImageNet), then we need to change the parameters to the base_model
.
So, what we're going to do is create our own top layers, and we can do this by setting include_top=False
.
What this means is that we'll use most of the model's existing layers to extract features and patterns out of our images, but then customize the final few layers to our own problem.
This kind of transfer learning is called feature extraction.
It’s a setup where you use an existing model's pretrained weights to extract features (or patterns) from your own custom data. You can then use those extracted features and further tailor them to your own use case.
For example
Let's go ahead and create an instance of base_model
without a top layer.
Input:
# Create a base model with no top
base_model = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(
include_top=False, # don't include the top layer (we want to make our own top layer)
include_preprocessing=True,
weights="imagenet",
input_shape=INPUT_SHAPE,
)
# Check the output shape
base_model.output_shape
Output:
(None, 7, 7, 1280)
Hmm, so what's going on here with this new output shape?
This still isn't what we want, because we're after (None, 120)
for our number of dog classes.
So how about we check the number of layers again?
Input:
# Count the number of layers
print(f"Number of layers in base_model: {len(base_model.layers)}")
Output
Number of layers in base_model: 270
Looks like our new base_model
has fewer layers than our previous one.
This is because we used include_top=False
.
This means we've still got 270 base layers to extract features and patterns from our images, however, it also means we get to customize the output layers to our liking.
We'll come back to this shortly.
In traditional programming, the developer explicitly defines the rules or algorithms that manipulate the input data to produce the desired output. However, in machine learning, the process is different and essentially reversed.
In machine learning:
In essence, machine learning involves providing the model with examples of the desired output for given inputs, allowing it to then learn the underlying patterns and rules needed to replicate that relationship.
This process allows the model to generalize from the training data to new data it hasn't seen before.
A model's parameters are the learned rules, and learned is the key word here.
In an ideal setup, we never tell the model what parameters to learn. Instead, it learns them itself by connecting input data to labels in supervised learning and by grouping together similar samples in unsupervised learning.
Note: Parameters are values learned by a model whereas hyperparameters (e.g. batch size) are values set by a human.
Parameters also get referred to as "weights" or "patterns" or "learned features" or "learned representations".
Generally, the more parameters a model has, the more capacity it has to learn. While each layer in a deep learning model will have a specific number of parameters (these vary depending on which layer you use).
The benefit of using a preconstructed model and transfer learning is that someone else has done the hard work in finding what combination of layers leads to a good set of parameters (a big thank you to these wonderful people).
We can count the number of parameters in a model/layer via the .count_params()
method.
Input:
# Check the number of parameters in our model
base_model.count_params()
Output:
5919312
Wow! Our model has 5,919,312 parameters.
That means each time an image goes through our model, it will be influenced in some small way by 5,919,312 numbers.
Each one of these is a potential learning opportunity (except for parameters that are non-trainable but we'll get to that soon too).
Now, you may be thinking, 5 million+ parameters sounds like a lot, and it is.
However, many modern large scale models, such as GPT-3 (175B) and GPT-4 (200B+? the actual number of parameters was never released) deal in the billions of parameters (Note: this is written in 2024, so if you're reading this in future, parameter counts may be in the trillions).
Generally, more parameters leads to better models, however, there are always trade offs.
More parameters means more compute power to run the models.
In practice, if you have limited compute power (e.g. a single GPU on Google Colab which is what we used), then it's best to start with smaller models and gradually increase the size when necessary.
We can get the trainable and non-trainable parameters from our model with the trainable_weights and non_trainable_weights attributes. (Remember, parameters are also referred to as weights).
Note: Trainable weights are parameters of the model which are updated by backpropagation during training (they are changed to better match the data). Whereas non-trainable weights are parameters of the model which are not updated by backpropagation during training (they are fixed in place).
Let's write a function to count the non-trainable and trainable parameters of our model.
Input:
import numpy as np
def count_parameters(model, print_output=True):
"""
Counts the number of trainable, non-trainable and total parameters of a given model.
"""
trainable_parameters = np.sum([np.prod(layer.shape) for layer in model.trainable_weights])
non_trainable_parameters = np.sum([np.prod(layer.shape) for layer in model.non_trainable_weights])
total_parameters = trainable_parameters + non_trainable_parameters
if print_output:
print(f"Model {model.name} parameter counts:")
print(f"Total parameters: {total_parameters}")
print(f"Trainable parameters: {trainable_parameters}")
print(f"Non-trainable parameters: {non_trainable_parameters}")
else:
return total_parameters, trainable_parameters, non_trainable_parameters
count_parameters(model=base_model, print_output=True)
Output:
Model efficientnetv2-b0 parameter counts:
Total parameters: 5919312
Trainable parameters: 5858704
Non-trainable parameters: 60608
Nice! It looks like our function worked, and most of our model's parameters are trainable.
This means they will be tweaked as they see more images of dogs.
However, a standard practice in transfer learning is to freeze the base layers of a model and only train the custom top layers to suit your problem.
Here you can see an example of how we can take a pretrained model and customize it to our own use case.
This kind of transfer learning workflow is often referred to as a feature extracting workflow as the base layers are frozen (not changed during training) and only the top layers are trained.
Note: In this image the EfficientNetB0 architecture is being demonstrated, however we're going to be using the EfficientNetV2B0 architecture which is slightly different. I've simply used the older architecture image from the research paper as a newer one wasn't available.
In other words, keep the patterns an existing model has learned on a similar problem (if they're good) to form a base representation of an input sample and then manipulate that base representation to suit our needs.
So why do this?
Simply because it's faster. The less trainable parameters, the faster your model training will be, and the faster your experiments will be.
But how will we know this works?
Well, we're going to run experiments to test it…
Okay, so how do we freeze the parameters of our base_model
?
We can set its .trainable
attribute to False
.
Input:
# Freeze the base model
base_model.trainable = False
base_model.trainable
Output:
False
This means that our base_model
is now frozen, so let's check how this affected the number of trainable and non-trainable parameters.
Input:
count_parameters(model=base_model, print_output=True)
Output:
Model efficientnetv2-b0 parameter counts:
Total parameters: 5919312.0
Trainable parameters: 0.0
Non-trainable parameters: 5919312
Beautiful!
All of the parameters in our base_model
are now non-trainable (frozen), which means they won't be updated during training.
Sidenote: If you're struggling to follow along, or feeling a little overwhelmed, then make sure to check out my complete Machine Learning and Data Science course. I cover this exact project inside that course.
We've spoken a couple of times about how our base_model
is a "feature extractor" or "pattern extractor", but what does this mean?
It means that when a data sample goes through the base_model
, its numbers get manipulated into a compressed set of features.
In other words, the layers of the model will each perform a calculation on the sample eventually leading to an output tensor with patterns the model has deemed most important.
This is often referred to as a ‘compressed feature space’, and it's one of the central ideas of deep learning.
For example
If we take a large input such as an image tensor of shape [224, 224, 3]
) and compress it into a smaller output such as a feature vector
of shape [1280]
) that captures a useful representation of the input.
Note: A feature vector is also referred to as an embedding. This is basically a compressed representation of a data sample that makes it useful.
The concept of embeddings is not limited to images either, the concept of embeddings stretches across all data types (text, images, video, audio + more).
We can see this in action by passing a single image through our base_model
.
Input:
# Extract features from a single image using our base model
feature_extraction = base_model(image_batch[0])
feature_extraction
Output
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-69-957d897dc1dc> in <cell line: 2>()
1 # Extract features from a single image using our base model
----> 2 feature_extraction = base_model(image_batch[0])
3 feature_extraction
/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py in error_handler(*args, **kwargs)
68 # To get the full stack trace, call:
69 # `tf.debugging.disable_traceback_filtering()`
---> 70 raise e.with_traceback(filtered_tb) from None
71 finally:
72 del filtered_tb
/usr/local/lib/python3.10/dist-packages/keras/src/engine/input_spec.py in assert_input_compatibility(input_spec, inputs, layer_name)
296 if spec_dim is not None and dim is not None:
297 if spec_dim != dim:
--> 298 raise ValueError(
299 f'Input {input_index} of layer "{layer_name}" is '
300 "incompatible with the layer: "
ValueError: Input 0 of layer "efficientnetv2-b0" is incompatible with the layer: expected shape=(None, 224, 224, 3), found shape=(224, 224, 3)
Oh no, another error!
If we look closer, we can see this section:
Output:
ValueError: Input 0 of layer "efficientnetv2-b0" is incompatible with the layer: expected shape=(None, 224, 224, 3), found shape=(224, 224, 3)
We've stumbled upon one of the most common errors in machine learning, which is shape errors.
In our case, the shape of the data we're trying to put into the model doesn't match the input shape the model is expecting.
Our input data shape is (224, 224, 3)
((height, width, colour_channels)
), however, our model is expecting (None, 224, 224, 3)
((batch_size, height, width, colour_channels)
).
We can fix this error by adding a singular batch_size
dimension to our input and thus make it (1, 224, 224, 3)
(a batch_size
of 1
for a single sample).
To do so, we can use the tf.expand_dims(input=target_sample, axis=0)
where target_sample
is our input tensor and axis=0
means we want to expand the first dimension.
Input:
# Current image shape
shape_of_image_without_batch = image_batch[0].shape
# Add a batch dimension to our single image
shape_of_image_with_batch = tf.expand_dims(input=image_batch[0], axis=0).shape
print(f"Shape of image without batch: {shape_of_image_without_batch}")
print(f"Shape of image with batch: {shape_of_image_with_batch}")
Output:
Shape of image without batch: (224, 224, 3)
Shape of image with batch: (1, 224, 224, 3)
Perfect!
Now let's pass this image with a batch dimension to our base_model
.
Input:
# Extract features from a single image using our base model
feature_extraction = base_model(tf.expand_dims(image_batch[0], axis=0))
feature_extraction
Output:
<tf.Tensor: shape=(1, 7, 7, 1280), dtype=float32, numpy=
array([[[[-2.19177201e-01, -3.44185606e-02, -1.40321642e-01, ...,
-1.44454449e-01, -2.73809850e-01, -7.41252452e-02],
[-8.69670734e-02, -6.48750067e-02, -2.14546964e-01, ...,
-4.57209721e-02, -2.77900100e-01, -8.20885971e-02],
[-2.76872963e-01, -8.26781020e-02, -3.85153107e-02, ...,
-2.72128999e-01, -2.52802134e-01, -2.28105962e-01],
...,
[-1.01604000e-01, -3.55145968e-02, -2.23027021e-01, ...,
-2.26227805e-01, -8.61771777e-02, -1.60450727e-01],
[-5.87608740e-02, -4.65543661e-03, -1.06193267e-01, ...,
-2.87548676e-02, -9.06914026e-02, -1.82624385e-01],
[-6.27618432e-02, -1.38620799e-03, 1.52704502e-02, ...,
-7.85450079e-03, -1.84584558e-01, -2.62404829e-01]],
[[-2.17334151e-01, -1.10280879e-01, -2.74605274e-01, ...,
-2.22405165e-01, -2.74738282e-01, -1.01998925e-01],
[-1.40700653e-01, -1.66820198e-01, -2.77449101e-01, ...,
2.40375683e-01, -2.77627349e-01, -9.07808691e-02],
[-2.40916476e-01, -2.00582087e-01, -2.38370374e-01, ...,
-8.27576742e-02, -2.78428614e-01, -1.23056054e-01],
...,
[-2.67296195e-01, -5.43131726e-03, -6.44061863e-02, ...,
-3.34720500e-02, -1.55141622e-01, -3.23073938e-02],
[-2.66513556e-01, -2.09966358e-02, -1.50375053e-01, ...,
-6.29274473e-02, -2.69798309e-01, -2.74081439e-01],
[-8.39830115e-02, -1.58605091e-02, -2.78447241e-01, ...,
-1.43555822e-02, -2.77474761e-01, 1.37483165e-01]],
[[-2.15840712e-01, 4.50323820e-01, -7.51058161e-02, ...,
-2.43637279e-01, -2.75048614e-01, -6.00421876e-02],
[-2.39066556e-01, -2.25066260e-01, -4.89832312e-02, ...,
-2.77957618e-01, -1.14677951e-01, -2.69968715e-02],
[-1.60943881e-01, -2.12972730e-01, -1.08622171e-01, ...,
-2.78464079e-01, -1.95970193e-01, -2.92074662e-02],
...,
[-2.67642140e-01, -7.13412274e-10, -2.47387841e-01, ...,
-1.27752789e-03, 1.69062471e+00, -1.07747754e-02],
[-2.69456387e-01, -3.02123808e-05, -2.19904676e-01, ...,
-1.19841937e-02, 6.54936790e-01, 4.92877871e-01],
[-1.83339473e-02, -9.84105989e-02, -2.77752399e-01, ...,
-9.53171253e-02, -2.76987553e-01, -1.81873620e-01]],
...,
[[-6.59235120e-02, -1.64803467e-03, -1.58951283e-01, ...,
-1.34164095e-01, -6.30896613e-02, -7.77927637e-02],
[-1.83377475e-01, -4.98497509e-04, -1.57654762e-01, ...,
-4.48885784e-02, -1.06884383e-01, -2.78372377e-01],
[-2.45749369e-01, -9.95399058e-03, -1.79216102e-01, ...,
-1.02837617e-02, -1.84168354e-01, -1.70697242e-01],
...,
[ 2.22050592e-01, -2.04384560e-04, -1.46467671e-01, ...,
-2.65387502e-02, -1.85434178e-01, -9.71652716e-02],
[ 1.52228832e+00, -3.39617883e-03, -3.22414264e-02, ...,
-1.19287046e-02, -1.46435276e-01, -8.73169452e-02],
[-1.89164400e-01, -5.49114570e-02, -2.05218419e-01, ...,
-1.32163316e-01, -1.48950770e-01, -1.18042991e-01]],
[[-2.16520607e-01, -7.84920622e-03, -1.43650264e-01, ...,
-1.73660204e-01, -4.83706780e-02, -3.76228467e-02],
[-2.78293848e-01, -6.24539470e-03, -2.28590608e-01, ...,
-2.06465453e-01, -1.93291768e-01, -9.23046917e-02],
[-2.40500003e-01, -2.73558766e-01, -1.58736348e-01, ...,
-4.13209312e-02, -2.64240265e-01, -3.26484852e-02],
...,
[-2.31358394e-01, -2.72292078e-01, -6.80670887e-02, ...,
-2.16453914e-02, -2.71368980e-01, -3.88960652e-02],
[-2.45319903e-01, -2.78179497e-01, -6.18890636e-02, ...,
-1.86282583e-02, -2.23804727e-01, -2.72233319e-02],
[-2.31111392e-01, -2.37449735e-01, -5.13911694e-02, ...,
-4.55225781e-02, -2.74753064e-01, -3.51530202e-02]],
[[-3.96142267e-02, -1.39998682e-02, -9.56050456e-02, ...,
-2.33392462e-01, -1.83407709e-01, -4.99856956e-02],
[-2.60713607e-01, -3.96164991e-02, -1.29626304e-01, ...,
-2.78417081e-01, -2.78285533e-01, -7.70441368e-02],
[-8.02241415e-02, -2.30456606e-01, -1.13508031e-01, ...,
-5.45607917e-02, -2.71063268e-01, -2.75666509e-02],
...,
[-9.41052362e-02, -2.42691532e-01, -5.48249595e-02, ...,
-2.13044193e-02, -2.63691694e-01, -9.28506851e-02],
[-9.08804908e-02, -2.40457997e-01, -7.88932368e-02, ...,
-3.80579121e-02, -2.71065891e-01, -4.05692160e-02],
[-1.26358300e-01, -2.17053503e-01, -7.44825602e-02, ...,
-5.66985942e-02, -2.75216103e-01, -6.91162944e-02]]]],
dtype=float32)>
Woah! Look at all those numbers!
After passing through ~270 layers, this is the numerical representation our model has created of our input image.
You might be thinking, okay, there's a lot going on here, how can I possibly understand all of them?
Well, with enough effort, you might. However, these numbers are more for a model/computer to understand than for a human to understand, so don’t worry about them.
feature_extraction
Let's not stop there, let's check the shape of our feature_extraction
.
Input:
# Check shape of feature extraction
feature_extraction.shape
Output:
TensorShape([1, 7, 7, 1280])
Ok, it looks like our model has compressed our input image into a lower dimensional feature space.
Note: Feature space (or latent space or embedding space) is a numerical region where pieces of data are represented by tensors of various dimensions. Feature space is hard for humans to imagine because it could be 1000s of dimensions (humans are only good at imagining 3-4 dimensions at max).
But you can think of feature space as an area where numerical representations of similar items will be close together. If the feature space was a grocery store, one breed of dogs may be in one aisle (similar numbers) whereas another breed of dogs may be in the next aisle. You can see an example of a large embedding space representation of 8M Stack Overflow questions on Nomic Atlas.
Let's compare the new shape to the input shape.
Input:
num_input_features = 224*224*3
feature_extraction_features = 1*7*7*1280
# Calculate the compression ratio
num_input_features / feature_extraction_features
Output:
2.4
Looks like our model has compressed the numerical representation of our input image by 2.4x so far.
But you might've noticed our feature_extraction
is still a tensor, so how about we turn it into a vector and compress the representation even further?
We can do this by taking our feature_extraction
tensor and pooling together the inner dimensions.
By pooling, I mean taking the average or the maximum values.
Why?
Well, because a neural network often outputs a large amount of learned feature values but many of them can be insignificant compared to others.
So taking the average or the max across them helps us to compress the representation further while still preserving the most important features.
This process is often referred to as:
tf.keras.layers.GlobalAveragePooling2D()
tf.keras.layers.MaxPooling2D()
Let's try to apply average pooling to our feature extraction and see what happens.
Input:
# Turn feature extraction into a feature vector
feature_vector = tf.keras.layers.GlobalAveragePooling2D()(feature_extraction) # pass feature_extraction to the pooling layer
feature_vector
Output:
<tf.Tensor: shape=(1, 1280), dtype=float32, numpy=
array([[-0.11521906, -0.04476562, -0.12476546, ..., -0.09118073,
-0.08420841, -0.07769417]], dtype=float32)>
As you can see, we've compressed our feature_extraction
tensor into a feature vector. (Notice the new shape of (1, 1280)
).
Now if you're not sure what all these numbers mean, that's okay. I don't either. All you need to know is that a feature vector (also called an embedding) is supposed to be a numerical representation that's meaningful to computers.
We'll need to perform a few more transforms on it before it's recognizable to us, so let's check out its shape.
Input:
# Check out the feature vector shape
feature_vector.shape
Output:
TensorShape([1, 1280])
We've reduced the shape of feature_extraction
from (1, 7, 7, 1280)
to (1, 1280)
.
What this means is we've gone from a tensor with multiple dimensions to a vector with one dimension of size 1280. Our neural network has performed calculations on our image and it is now represented by 1280 numbers.
This is one of the main goals of deep learning, to reduce higher dimensional information into a lower dimensional but still representative space.
Let's calculate how much we've reduced the dimensionality of our single input image.
Input:
# Compare the reduction
num_input_features = 224*224*3
feature_extraction_features = 1*7*7*1280
feature_vector_features = 1*1280
print(f"Input -> feature extraction reduction factor: {num_input_features / feature_extraction_features}")
print(f"Feature extraction -> feature vector reduction factor: {feature_extraction_features / feature_vector_features}")
print(f"Input -> feature extraction -> feature vector reduction factor: {num_input_features / feature_vector_features}")
Output:
Input -> feature extraction reduction factor: 2.4
Feature extraction -> feature vector reduction factor: 49.0
Input -> feature extraction -> feature vector reduction factor: 117.6
That’s a 117.6x reduction from our original image to its feature vector representation!
But why compress the representation like this?
Because representing our data in a compressed format but still with meaningful numbers (to a computer) means that less computation is required to reuse the patterns.
For example
Imagine you had to relearn how to spell words every time you wanted to use them.
Would this be efficient?
Not at all. Instead, you take a while to learn them at the start and then continually reuse this knowledge over time. This is the same with a deep learning model.
It learns representative patterns in data, figures out the ideal connections between inputs and outputs and then reuses them over time in the form of numerical weights.
We've covered a fair bit in the past few sections, so let's practice.
The important takeaway here is that one of the main goals of deep learning is to create a model that is able to take some kind of high dimensional data (e.g. an image tensor, a text tensor, an audio tensor) and extract meaningful patterns in it whilst compressing it to a lower dimensional form (e.g. a feature vector or embedding).
We can then use this lower dimensional form for our specific use cases, and one of the most powerful ways to do this is with transfer learning.
Taking an existing model from a similar domain to yours and applying it to your own problem.
To practice turning a data sample into a feature vector, let's start by recreating a base_model
instance.
This time, we can also add in a pooling layer automatically using pooling="avg"
or pooling="max"
.
Note: I demonstrated the use of the
tf.keras.layers.GlobalAveragePooling2D()
layer because not all pretrained models have the functionality of a pooling layer being built-in.
Input:
# Create a base model with no top and a pooling layer built-in
base_model = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(
include_top=False,
weights="imagenet",
input_shape=INPUT_SHAPE,
pooling="avg", # can also use "max"
include_preprocessing=True,
)
# Check the summary (optional)
# base_model.summary()
# Check the output shape
base_model.output_shape
Output:
(None, 1280)
Boom! We get the same output shape from the base_model
as we did when using it with a pooling layer thanks to using pooling="avg"
.
Let's now freeze these base weights, so they're not trainable.
Input:
# Freeze the base weights
base_model.trainable = False
# Count the parameters
count_parameters(model=base_model, print_output=True)
Output:
Model efficientnetv2-b0 parameter counts:
Total parameters: 5919312.0
Trainable parameters: 0.0
Non-trainable parameters: 5919312
And now we can pass an image through our base model and get a feature vector from it.
Input:
# Get a feature vector of a single image (don't forget to add a batch dimension)
feature_vector_2 = base_model(tf.expand_dims(image_batch[0], axis=0))
feature_vector_2
Output:
<tf.Tensor: shape=(1, 1280), dtype=float32, numpy=
array([[-0.11521906, -0.04476562, -0.12476546, ..., -0.09118073,
-0.08420841, -0.07769417]], dtype=float32)>
Wonderful!
Now is this the same as our original feature_vector
?
Well, we can find out by comparing feature_vector
and feature_vector_2
and seeing if all of the values are the same with np.all()
.
Input:
# Compare the two feature vectors
np.all(feature_vector == feature_vector_2)
Output:
True
Perfect, it worked!
So now let's put it all together and create a full model for our dog vision problem.
The main steps when creating any kind of deep learning model from scratch are:
These sound broad because they are. Deep learning models are almost infinitely customizable.
Good news is, thanks to transfer learning, all of our middle layers are defined by base_model
(you could argue the input layer is created too).
So now it's up to us to define our input and output layers.
TensorFlow/Keras have two main ways of connecting layers to form a model.
tf.keras.Sequential
) - Useful for making simple models with one tensor in and one tensor out, not suited for complex modelsLet's start with the Sequential model.
It takes a list of layers and will pass data through them sequentially.
Our base_model
will be the input and middle layers and we'll use a tf.keras.layers.Dense()
layer as the output (we'll discuss this shortly).
The Sequential API is the most straightforward way to create a model.
And because your model comes in the form of a list of layers from input to middle layers to output, each layer is executed sequentially.
Input:
# Create a sequential model
tf.random.set_seed(42)
sequential_model = tf.keras.Sequential([base_model, # input and middle layers
tf.keras.layers.Dense(units=len(dog_names), # output layer
activation="softmax")])
sequential_model.summary()
Output:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
efficientnetv2-b0 (Functio (None, 1280) 5919312
nal)
dense (Dense) (None, 120) 153720
=================================================================
Total params: 6073032 (23.17 MB)
Trainable params: 153720 (600.47 KB)
Non-trainable params: 5919312 (22.58 MB)
_________________________________________________________________
Wonderful!
We've now got a model with 6,073,032 parameters, however, only 153,720 of them (the ones in the dense layer) are trainable.
Our dense layer (also called a fully-connected layer or feed-forward layer) takes the outputs of the base_model
and performs further calculations on them to map them to our required number of classes (120 for the number of dog breeds).
We can now use activation="softmax"
(the Softmax function) to get prediction probabilities. These are values between 0 and 1 which represent how much our model "thinks" a specific image relates to a certain class.
Sidenote: There's another common activation function called Sigmoid, that we could use if we only had two classes, for example, "dog" or "cat".
Confusing, yes, but you'll get used to different functions with practice.
The following table summarizes a few use cases.
Now that our model is built, let's check our input and output shapes.
Input:
# Check the input shape
sequential_model.input_shape
Output:
(None, 224, 224, 3)
Input:
# Check the output shape
sequential_model.output_shape
Output:
(None, 120)
Our sequential model takes in an image tensor of size [None, 224, 224, 3]
and outputs a vector of shape [None, 120]
where None
is the batch size we specify.
Now let's try our sequential model out with a single image input.
Input:
# Get a single image with a batch size of 1
single_image_input = tf.expand_dims(image_batch[0], axis=0)
# Pass the image through our model
single_image_output_sequential = sequential_model(single_image_input)
# Check the output
single_image_output_sequential
Output:
<tf.Tensor: shape=(1, 120), dtype=float32, numpy=
array([[0.00783153, 0.01119391, 0.00476165, 0.0072348 , 0.00766934,
0.00753752, 0.00522398, 0.02337082, 0.00579716, 0.00539333,
0.00549823, 0.01011768, 0.00610076, 0.0109506 , 0.00540159,
0.0079683 , 0.01227358, 0.01056393, 0.00507148, 0.00996652,
0.00604106, 0.00729022, 0.0155036 , 0.00745004, 0.00628229,
0.00796217, 0.00905823, 0.00712278, 0.01243507, 0.006427 ,
0.00602891, 0.01276839, 0.00652441, 0.00842482, 0.01247454,
0.00749902, 0.01086363, 0.007803 , 0.0058652 , 0.00474356,
0.00902809, 0.00715358, 0.00981051, 0.00444271, 0.01031628,
0.00691859, 0.00699083, 0.0065892 , 0.00966169, 0.01177148,
0.00908043, 0.00729699, 0.00496712, 0.00509035, 0.00584058,
0.01068885, 0.00817651, 0.00602052, 0.00901201, 0.01008151,
0.00495409, 0.01285929, 0.00480146, 0.0108622 , 0.01421483,
0.00814719, 0.00910061, 0.00798947, 0.00789293, 0.00636969,
0.00656019, 0.01309155, 0.00754355, 0.00702062, 0.00485884,
0.00958675, 0.01086809, 0.00682202, 0.00923016, 0.00856321,
0.00482627, 0.01234931, 0.01140433, 0.00771413, 0.01140642,
0.00382939, 0.00891482, 0.00409833, 0.00771865, 0.00652135,
0.00668143, 0.00935989, 0.00784146, 0.00751913, 0.00785116,
0.00794632, 0.0079146 , 0.00798953, 0.01011222, 0.01318719,
0.00721227, 0.00736159, 0.01369175, 0.01087009, 0.00510072,
0.00843218, 0.00451756, 0.00966478, 0.01013771, 0.00715721,
0.00367131, 0.00825834, 0.00832634, 0.01225684, 0.00724481,
0.00670675, 0.00536995, 0.01070637, 0.00937007, 0.00998812]],
dtype=float32)>
Nice!
Our model has output a tensor of prediction probabilities in shape [1, 120]
, with one value for each of our dog classes.
And thanks to the softmax function, all of these values are between 0 and 1 and they should all add up to 1 (or close to it).
Input:
# Sum the output
np.sum(single_image_output_sequential)
Output:
1.0
Beautiful!
So how do we figure out which of the values our model thinks is most likely?
Well, we take the index of the highest value!
We can find the index of the highest value using tf.argmax()
or by using np.argmax()
.
We'll get the highest value (not the index) alongside it.
Note that these values may change every time due to the model/data being randomly initialized. Don't worry too much about them being different, in machine learning randomness is a good thing.
So let's try.
Input:
# Find the index with the highest value
highest_value_index_sequential_model_output = np.argmax(single_image_output_sequential)
highest_value_sequential_model_output = np.max(single_image_output_sequential)
print(f"Highest value index: {highest_value_index_sequential_model_output} ({dog_names[highest_value_index_sequential_model_output]})")
print(f"Prediction probability: {highest_value_sequential_model_output}")
Output:
Highest value index: 7 (basenji)
Prediction probability: 0.023370817303657532
Hmm. This prediction probability value is quite low.
In this example, the model predicts "basenji" with a very low confidence of about 2.34%. With the highest potential value being 1.0
, this indicates that the model is not very confident in its prediction.
Next, let's verify the actual label for our single image to see if the model's prediction was accurate.
Input:
# Check the original label value
print(f"Predicted value: {highest_value_index_sequential_model_output}")
print(f"Actual value: {tf.argmax(label_batch[0]).numpy()}")
Output:
Predicted value: 7
Actual value: 95
Unfortunately, the model predicted the wrong label.
This discrepancy is expected because, although our model has pretrained parameters from ImageNet, the dense layer added at the end is initialized with random parameters. Therefore, initially, our model is essentially guessing the labels.
To complete the analysis, let's compare the text-based labels from the model's prediction and the original ground truth.
Input:
# Index on class_names with our model's highest prediction probability
sequential_model_predicted_label = class_names[tf.argmax(sequential_model(tf.expand_dims(image_batch[0], axis=0)), axis=1).numpy()[0]]
# Get the truth label
single_image_ground_truth_label = class_names[tf.argmax(label_batch[0])]
# Print predicted and ground truth labels
print(f"Sequential model predicted label: {sequential_model_predicted_label}")
print(f"Ground truth label: {single_image_ground_truth_label}")
Output:
Sequential model predicted label: basenji
Ground truth label: schipperke
Here, the model predicted "basenji," whereas the actual label was "schipperke."
This result confirms that our model's initial predictions are not reliable due to the random initialization of the dense layer's parameters.
So what can we do?
Well, we can try another method for creating a model!
As mentioned before, the Keras Functional API is another method for creating more complex models.
It can include multiple different modeling steps, but it can also be used for simple models, and it's the way we'll construct our Dog Vision models going forward.
Let's recreate our sequential_model
using the Functional API.
We'll follow the same process as mentioned before:
tf.keras.Model()
.Input:
# 1. Create input layer
inputs = tf.keras.Input(shape=INPUT_SHAPE)
# 2. Create hidden layer
x = base_model(inputs, training=False)
# 3. Create the output layer
outputs = tf.keras.layers.Dense(units=len(class_names), # one output per class
activation="softmax",
name="output_layer")(x)
# 4. Connect the inputs and outputs together
functional_model = tf.keras.Model(inputs=inputs,
outputs=outputs,
name="functional_model")
# Get a model summary
functional_model.summary()
Output:
Model: "functional_model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_4 (InputLayer) [(None, 224, 224, 3)] 0
efficientnetv2-b0 (Functio (None, 1280) 5919312
nal)
output_layer (Dense) (None, 120) 153720
=================================================================
Total params: 6073032 (23.17 MB)
Trainable params: 153720 (600.47 KB)
Non-trainable params: 5919312 (22.58 MB)
_________________________________________________________________
Our functional model is now created so let's try it out.
It works in the same fashion as our sequential_model
. (But hopefully a little better!).
Input:
# Pass a single image through our functional_model
single_image_output_functional = functional_model(single_image_input)
# Find the index with the highest value
highest_value_index_functional_model_output = np.argmax(single_image_output_functional)
highest_value_functional_model_output = np.max(single_image_output_functional)
highest_value_index_functional_model_output, highest_value_functional_model_output
Output:
(69, 0.017855722)
Looks like we got a slightly different value to our sequential_model
(or they may be the same if randomness wasn't so random).
Why is this?
Because our functional_model
was initialized with a random tf.keras.layers.Dense
layer, so the outputs of our functional_model
are essentially random as well (neural networks start with random numbers and adjust them to better represent patterns in data).
Not to fear, we'll fix this soon when we train our model.
Right now we've created our model with a few scattered lines of code, so how about we functionize the model creation so we can repeat it later on?
We've created two different kinds of models so far. Each of which uses the same layers, except one was with the Keras Sequential API and the other was with the Keras Functional API.
However, it would be quite tedious to rewrite that modeling code every time we wanted to create a new model right?
So let's create a function called create_model()
to replicate the model creation step with the Functional API.
Note: We're focused on the Functional API in this example, since it takes a bit more practice than the Sequential API.
Input:
def create_model(include_top: bool = False,
num_classes: int = 1000,
input_shape: tuple[int, int, int] = (224, 224, 3),
include_preprocessing: bool = True,
trainable: bool = False,
dropout: float = 0.2,
model_name: str = "model") -> tf.keras.Model:
"""
Create an EfficientNetV2 B0 feature extractor model with a custom classifier layer.
Args:
include_top (bool, optional): Whether to include the top (classifier) layers of the model.
num_classes (int, optional): Number of output classes for the classifier layer.
input_shape (tuple[int, int, int], optional): Input shape for the model's images (height, width, channels).
include_preprocessing (bool, optional): Whether to include preprocessing layers for image normalization.
trainable (bool, optional): Whether to make the base model trainable.
dropout (float, optional): Dropout rate for the global average pooling layer.
model_name (str, optional): Name for the created model.
Returns:
tf.keras.Model: A TensorFlow Keras model with the specified configuration.
"""
# Create base model
base_model = tf.keras.applications.efficientnet_v2.EfficientNetV2B0(
include_top=include_top,
weights="imagenet",
input_shape=input_shape,
include_preprocessing=include_preprocessing,
pooling="avg" # Can use this instead of adding tf.keras.layers.GlobalPooling2D() to the model
# pooling="max" # Can use this instead of adding tf.keras.layers.MaxPooling2D() to the model
)
# Freeze the base model (if necessary)
base_model.trainable = trainable
# Create input layer
inputs = tf.keras.Input(shape=input_shape, name="input_layer")
# Create model backbone (middle/hidden layers)
x = base_model(inputs, training=trainable)
# x = tf.keras.layers.GlobalAveragePooling2D()(x) # note: you should include pooling here if not using `pooling="avg"`
# x = tf.keras.layers.Dropout(0.2)(x) # optional regularization layer (search "dropout" for more)
# Create output layer (also known as "classifier" layer)
outputs = tf.keras.layers.Dense(units=num_classes,
activation="softmax",
name="output_layer")(x)
# Connect input and output layer
model = tf.keras.Model(inputs=inputs,
outputs=outputs,
name=model_name)
return model
What a beautiful function!
Let's try it out.
Input:
# Create a model
model_0 = create_model(num_classes=len(class_names))
model_0.summary()
Output:
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_layer (InputLayer) [(None, 224, 224, 3)] 0
efficientnetv2-b0 (Functio (None, 1280) 5919312
nal)
output_layer (Dense) (None, 120) 153720
=================================================================
Total params: 6073032 (23.17 MB)
Trainable params: 153720 (600.47 KB)
Non-trainable params: 5919312 (22.58 MB)
_________________________________________________________________
Woohoo! Looks like it worked!
Now how about we inspect each of the layers and whether they're trainable?
Input:
for layer in model_0.layers:
print(layer.name, layer.trainable)
Output:
input_layer True
efficientnetv2-b0 False
output_layer True
Nice, looks like our base_model
(efficientnetv2-b0
) is frozen and not trainable, while our output_layer
is trainable.
This means we'll be reusing the patterns learned in the base_model
to feed into our output_layer
and then customizing those parameters to suit our own problem.
We've seen our model make a couple of predictions on our data, and so far it hasn't done so well. This is expected though, as our model is essentially predicting random class values given an image.
So let's change that by training the final layer on our model to be customized to recognizing images of dogs for our project.
We can do so via five steps:
We'll work through each of these over the next few sections.
We’ve done this already, but for the interests of having this task all in one section, let's create our model using the create_model()
function that we made earlier.
Input:
# 1. Create model
model_0 = create_model(num_classes=len(class_names),
model_name="model_0")
model_0.summary()
Output:
Model: "model_0"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_layer (InputLayer) [(None, 224, 224, 3)] 0
efficientnetv2-b0 (Functio (None, 1280) 5919312
nal)
output_layer (Dense) (None, 120) 153720
=================================================================
Total params: 6073032 (23.17 MB)
Trainable params: 153720 (600.47 KB)
Non-trainable params: 5919312 (22.58 MB)
_________________________________________________________________
Sidenote: Remember, if you're struggling to follow along, or feeling a little overwhelmed, then make sure to check out my complete Machine Learning and Data Science course. as I cover this exact project inside that course.
After we've created our model, the next step is to compile it.
We can compile our model_0
using the tf.keras.Model.compile()
method.
There are many options we can pass to the compile()
method, however, the main ones we'll be focused on are:
These three settings work together to help improve a model.
An optimizer tells a model how to improve its internal parameters (weights) to hopefully improve a loss value.
In most cases, improving the loss means to minimize it (a loss value is a measure of how wrong your model's predictions are, a perfect model will have a loss value of 0).
It does this through a process called gradient descent.
The gradients needed for gradient descent are calculated through backpropagation, a method that computes the gradient of the loss function with respect to each weight in the model.
Once the gradients have been calculated, the optimizer then tries to update the model weights so that they move in the opposite direction of the gradient (if you go down the gradient of a function, you reduce its value).
If you've never heard of the above processes, that's okay. TensorFlow implements many of them behind the scenes.
For now, the main takeaway is that neural networks learn in the following fashion:
Here’s an example of how a neural network learns.
Note the cyclical nature of the learning. You can think of it as a big game of guess and check, where the guess (hopefully) gets better over time.
I'll leave the intricacies of gradient descent and backpropagation to your own extra-curricula research.
For now, we're going to focus on using the tools TensorFlow has to offer to implement this process instead.
As for optimizer functions, there are two main options to get started:
Why these two?
Because they're the most often used in practice (you can see this via the number of machine learning papers referencing each one on paperswithcode.com).
There are many other optimizers available in the tf.keras.optimizers
module too.
The good thing about using a premade optimizer from tf.keras.optimizers
is that they usually come with good starting settings. One of the main ones being the learning_rate
value.
The learning_rate
is one of the most important hyperparameters to set in a neural network training setup.
It determines how much of a step change the optimizer will adjust your model's weights every iteration. Too low and the model won't learn. Too high and the model will try to take too big of steps.
By default, TensorFlow sets the learning rate of the Adam optimizer to 0.001 (tf.keras.optimizers.Adam(learning_rate=0.001)
) which is a good setting for many problems to get started with.
We can also set this default with the shortcut optimizer="adam"
.
Input:
# Create optimizer (short version)
optimizer = "adam"
# The above line is the same as below
optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
optimizer
Output:
<keras.src.optimizers.adam.Adam at 0x7f3bb4107040>
A loss function measures how wrong your model's predictions are.
Why care?
Well, a model with poor predictions in comparison to the truth data will have a high loss value. Whereas a model with perfect predictions (e.g. it gets every prediction correct) will have a loss value of 0.
Different problems have different loss functions.
Some of the most common ones include:
In our case, since we're working with multi-class classification (multiple different dog breeds) and our labels are one-hot encoded, we'll be using tf.keras.losses.CategoricalCrossentropy
.
We can leave all of the default parameters as they are as well.
However, if we didn't have activation="softmax"
in the final layer of our model, we'd have to change from_logits=False
to from_logits=True
as the softmax activation function does this conversion for us.
There are more loss functions than the ones we've discussed and you can see many of them on paperswithcode.com. TensorFlow also has many more loss function implementations available in tf.keras.losses
.
For now though, let's check out a single sample of our labels to make sure they're one-hot encoded.
Input:
# Check that our labels are one-hot encoded
label_batch[0]
Output:
<tf.Tensor: shape=(120,), dtype=float32, numpy=
array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0.], dtype=float32)>
Excellent! Looks like our labels are indeed one-hot encoded.
Now let's create our loss function as tf.keras.losses.CategoricalCrossentropy(from_logits=False)
or "categorical_crossentropy"
for short.
We set from_logits=False
(this is the default) because our model uses activation="softmax"
in the final layer so it's outputting prediction probabilities rather than logits. (Without activation="softmax"
the outputs of our model would be referred to as logits, I'll leave this for your own extra-curricular investigation).
Input:
# Create our loss function
loss = tf.keras.losses.CategoricalCrossentropy(from_logits=False) # use from_logits=False if using an activation function in final layer of model (default)
loss
Output:
<keras.src.losses.CategoricalCrossentropy at 0x7f3bb4107430>
The evaluation metric is a human-readable value which is used to see how well your model is performing.
(A slightly confusing concept is that the evaluation metric and loss function can be the same equation).
However, the main difference between a loss function and an evaluation metric is that the loss function will typically be differentiable. There are some exceptions to the rule but in most cases, the loss function will be differentiable, whereas, the evaluation metric does not have to be differentiable.
In the case of regression (predicting a number), your loss function and evaluation metric could be mean squared error (MSE).
Whereas in the case of classification, your loss function will generally be binary cross entropy (for two classes) or categorical cross entropy (for multiple classes) and your evaluation metric(s) could be accuracy, F1-score, precision and/or recall.
TensorFlow provides many pre-built metrics in the tf.keras.metrics
module.
The tf.keras.Model.compile()
method expects the metrics
parameter input as a list.
Since we're working with a classification problem, let's set up our evaluation metric as accuracy.
Input:
# Create list of evaluation metrics
metrics = ["accuracy"]
We've briefly touched on optimizers, loss functions, gradient descent and backpropagation, which are the backbone of neural network learning.
However, for a more in-depth look at each of these, I recommend checking out the following:
Phew!
We've been through all the main steps in compiling a model:
Now let's put everything we've done together and compile our model_0
.
First we'll do it with shortcuts (e.g. "accuracy"
) then we'll do it with specific classes.
Input:
# Compile model with shortcuts (faster to write code but less customizable)
model_0.compile(optimizer="adam",
loss="categorical_crossentropy",
metrics=["accuracy"])
# Compile model with classes (will do the same as above)
model_0.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
loss=tf.keras.losses.CategoricalCrossentropy(from_logits=False),
metrics=["accuracy"])
Now that we have our model created and compiled, it’s time to fit it to the data.
This means we're going to pass all of the data that we have from Part 1 in this series (dog images and their assigned labels) through our model and ask it to try and learn the relationship between the images and the labels.
Fitting the model is step 3 in our list:
We can fit our model_0
instance with the tf.keras.Model.fit()
method.
The main parameters of the fit()
method we'll be paying attention to are:
x
= What data do you want the model to train on?y
= What labels do you want your model to learn the patterns from your data to?batch_size
= The number of samples your model will look at per gradient update (e.g. 32 samples at a time before updating its internal patterns)epochs
= How many times do you want the model to go through all samples (e.g. epochs=5
means looking at all of the data 5 times)?validation_data
= What data do you want to evaluate your model's learning on?There are plenty more options in the TensorFlow/Keras documentation for the fit()
method. However, these options will be more than enough for us.
In our case, let's keep our experiments quick and set the following:
x=train_10_percent_ds
- Since we've crafted a tf.data.Dataset
, our x
and y
values are combined into one. We'll also start by training on 10% of the data for quicker experimentation (if things work on a smaller subset of the data, we can always increase it).epochs=5
- The more epochs you do, the more opportunities your model has to learn patterns, however, it also prolongs training.validation_data=test_ds
- We'll evaluate the model's learning on the test dataset (samples it's never seen before).Let's do it!
Time to train our first neural network and bring Dog Vision 🐶👁️ to life!
Note: If you don't have a GPU here, training will likely take a considerably long time.
You can activate a GPU in Google Colab by going to Runtime -> Change runtime type -> Hardware accelerator -> GPU. Note that changing a runtime type will mean you will have to restart your runtime and rerun all of the cells above, but it will take far less time to run.
Input:
# Fit model_0 for 5 epochs
epochs = 5
history_0 = model_0.fit(x=train_10_percent_ds,
epochs=epochs,
validation_data=test_ds)
Output:
Epoch 1/5
38/38 [==============================] - 27s 482ms/step - loss: 3.9758 - accuracy: 0.3000 - val_loss: 3.0500 - val_accuracy: 0.5415
Epoch 2/5
38/38 [==============================] - 14s 379ms/step - loss: 2.0531 - accuracy: 0.8008 - val_loss: 1.8650 - val_accuracy: 0.7041
Epoch 3/5
38/38 [==============================] - 14s 375ms/step - loss: 1.0491 - accuracy: 0.9025 - val_loss: 1.3060 - val_accuracy: 0.7548
Epoch 4/5
38/38 [==============================] - 14s 373ms/step - loss: 0.6138 - accuracy: 0.9483 - val_loss: 1.0317 - val_accuracy: 0.7910
Epoch 5/5
38/38 [==============================] - 14s 373ms/step - loss: 0.4157 - accuracy: 0.9683 - val_loss: 0.8927 - val_accuracy: 0.8044
It looks like our model performed outstandingly well, and achieved a validation accuracy of ~80% after just 5 epochs of training!
This is far better than the original Stanford Dogs paper results of 22% accuracy. (The paper that this project is based on).
How did it perform so much better?
Well, that's the power of transfer learning combined with a series of modern updates to neural network architectures, hardware and training regimes.
And just like that, our neural network has been built and trained.
However, we have to remember that these are just numbers on a page, and we’ll need to evaluate our results in more detail to be sure.
We'll get more in-depth on evaluations in Part 3 of this series. (Coming soon).
In the next part of this guide, we’ll evaluate our trained model and make predictions, so that we can get this project finished and up and running, and you can add this huge project to your portfolio!
Be sure to subscribe via the link below so you don’t miss it.
If you want to deep dive into Machine Learning and learn how to use these tools even further, then check out my complete Machine Learning and Data Science course or watch the first few videos for free.
It’s one of the most popular, highly rated Machine Learning and Data Science bootcamps online, as well as the most modern and up-to-date. Guaranteed.
You'll go from a complete beginner with no prior experience to getting hired as a Machine Learning Engineer this year, so it’s helpful for ML Engineers of all experience levels.
Or, if you already have a good grasp of Machine Learning, and just want to focus on Tensorflow for Deep Learning, I have a course on that also that you can check out here.
When you join as a Zero To Mastery Academy member, you’ll have access to both of these courses, as well as every other course in our training library!
Not only that, but you will also be able to ask me questions, as well as chat to other students and machine learning professionals via our private Discord community.
So go ahead and check those out, and don’t forget to subscribe below so you don’t miss Part 3 of this series on Tensorflow and deep learning!