Welcome to Part 3 in my brand new 3-Part series on Tensorflow and Deep Learning.
Sidenote: Technically this 'mini series' is part of my larger 'Introduction to Machine Learning' series, but I went so deep on this particular section, I needed to make it into 3 parts!
Be sure to check out the other parts in this TensorFlow series, as they all lead into each other:
So a quick recap of the series so far.
The goal of this series is to give you an overview of deep learning (and more specifically, transfer learning) when using Tensorflow and Keras.
Even better still?
Rather than just tell you what this all means, I’m also going to walk you through a project that you can follow along with, so you can learn as you go.
The project we’re going to build is called ‘Dog Vision’. It’s a neural network capable of identifying different dog breeds via images.
In the first part of the series, we took the time to set up the project, get our data, explore it, create a training set, and then turn our data into a Tensorflow dataset. This is an essential skill for any machine learning project, but a fairly large task - hence why it was a step on its own
In the second part of the series we took the dataset that we created in Part 1 and used it to build a neural network, train the model, and then fit the model on the data
Finally, in the third part of this series, (which you’re reading right now), we’ll evaluate our model, make predictions, and work through the deployment phases, which are crucial for understanding how to assess and utilize the trained models effectively
So let’s finish up this project!
My name is Daniel Bourke, and I'm the resident Machine Learning instructor here at Zero To Mastery.
Originally self-taught, I worked for one of Australia's fastest-growing artificial intelligence agencies, Max Kelsen, and have worked on Machine Learning and data problems across a wide range of industries including healthcare, eCommerce, finance, retail, and more.
I'm also the author of Machine Learning Monthly, write my own blog on my experiments in ML, and run my own YouTube channel - which has hit over 8 Million views.
Sidenote: If you want to deep dive into Machine Learning and learn how to use these tools even further, then check out my complete Machine Learning and Data Science course or watch the first few videos for free.
It’s one of the most popular, highly rated Machine Learning and Data Science bootcamps online, as well as the most modern and up-to-date. Guaranteed.
You'll go from a complete beginner with no prior experience to getting hired as a Machine Learning Engineer this year, so it’s helpful for ML Engineers of all experience levels.
Want a sample of the course? Well, check out the video below:
If you already have a good grasp of Machine Learning, and just want to focus on Tensorflow for Deep Learning, I have a course on that also that you can check out here.
With that out of the way, let’s get into this guide.
The next step in our journey is to evaluate our trained model.
There are several ways to do this:
We've done the first one, as these metrics were the outputs of our model training. So now we're going to focus on the next two - plotting loss curves and making predictions on the test set.
(Don’t worry as we’ll get to custom images later on also).
Loss curves visualize how your model's loss value performs over time. An ideal loss curve will start high and move towards zero. A perfect model will have a loss value of zero.
We say loss "curves" as a plural because you can have a loss curve for each dataset, training, validation, and test.
We have a few options.
History
object. This is where the object is returned by the fit
method of tf.keras.Model
instancesThe good news is that we've already got one, from the work we did in Part 2 of this series. It should be saved to history_0
. (The model history for model_0
).
The
History.history
attribute contains a record of the training loss values and evaluation metrics for each epoch.
So let's check it out.
Input:
# Inspect History.history attribute for model_0
History_0.history
Output:
{'loss': [3.926330089569092,
1.9898805618286133,
1.0152279138565063,
0.599678099155426,
0.4040333032608032],
'accuracy': [0.32249999046325684,
0.7900000214576721,
0.9058333039283752,
0.9483333230018616,
0.9708333611488342],
'val_loss': [2.996889591217041,
1.8436286449432373,
1.2817054986953735,
1.0173338651657104,
0.8792150616645813],
'val_accuracy': [0.5548951029777527,
0.7062937021255493,
0.7701631784439087,
0.7945221662521362,
0.8107225894927979]}
It works and we've got a history of our model training over time.
Not only that, but it looks like everything is moving in the right direction. Loss is going down whilst accuracy is going up, which is the ideal outcome for our loss curves.
So what now?
Well, how about we adhere to the data explorer's motto and write a function to visualize, visualize, visualize! so we can understand this data easier.
We'll call the function plot_model_loss_curves()
and we'll take a History.history
object as input and then plot loss and accuracy curves using matplotlib
, like so:
Input:
def plot_model_loss_curves(history: tf.keras.callbacks.History) -> None:
"""Takes a History object and plots loss and accuracy curves."""
# Get the accuracy values
acc = history.history["accuracy"]
val_acc = history.history["val_accuracy"]
# Get the loss values
loss = history.history["loss"]
val_loss = history.history["val_loss"]
# Get the number of epochs
epochs_range = range(len(acc))
# Create accuracy curves plot
plt.figure(figsize=(14, 7))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label="Training Accuracy")
plt.plot(epochs_range, val_acc, label="Validation Accuracy")
plt.legend(loc="lower right")
plt.title("Training and Validation Accuracy")
plt.xlabel("Epoch")
plt.ylabel("Accuracy")
# Create loss curves plot
plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label="Training Loss")
plt.plot(epochs_range, val_loss, label="Validation Loss")
plt.legend(loc="upper right")
plt.title("Training and Validation Loss")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.show()
plot_model_loss_curves(history=history_0)
Output:
Woohoo! Now those are some nice-looking curves.
Our model is doing exactly what we'd like it to do. The accuracy is moving up while the loss is going down. However, you might be wondering why there's a gap between the training and validation loss curves, as ideally, the two lines would closely follow each other.
Well, in our case, the validation loss doesn't decrease as low as the training loss.
This is known as overfitting, which is a common problem in machine learning where a model learns the training data very well but doesn't generalize to other unseen data.
So let me explain...
You can imagine overfitting as an athlete who excels at running on a specific track with consistent conditions.
This athlete can achieve outstanding times as long as the track and weather conditions remain the same. However, when asked to run on a different track with varying conditions, their performance drops significantly because they haven't adapted to diverse scenarios, such as heat or cold, or traction on the track.
On the other hand, underfitting is like an athlete who performs poorly regardless of the track or conditions. They haven't trained adequately, so they can't achieve good results in any situation.
Or in even simpler terms. One is great as long as it's the ideal conditions. One is poor quality regardless of conditions.
The good news is that our model isn't underfitting. In fact, it's performing at ~80% accuracy on unseen data. This means that we must have overfitting issues.
Now, there are a lot of different ways to fix overfitting. But one of the best ways is to use more data, and guess what - we've got plenty more!
(Remember, these results were achieved using only 10% of the training data).
However, before we train a model with more data, there's another way to quickly evaluate our model on a given dataset just to confirm these results, and that's by using the tf.keras.Model.evaluate()
method.
So how about we try it on our model_0
?
We'll save the outputs to a model_0_results
variable so we can use them later.
Input:
# Evaluate model_0, see: https://www.tensorflow.org/api_docs/python/tf/keras/Model#evaluate
model_0_results = model_0.evaluate(x=test_ds)
model_0_results
269/269 [==============================] - 13s 47ms/step - loss: 0.8792 - accuracy: 0.8107
Output:
[0.8792150616645813, 0.8107225894927979]
As you can see, evaluating our model still shows it's performing at ~80% accuracy despite only seeing 10% of the training data.
We can also get the metrics used by our model with the metrics_names
attribute.
Input:
# Get our model's metrics names
Model_0.metrics_names
Output:
['loss', 'accuracy']
Time to step it up a notch.
We've trained a model on 10% of the training data to see if it works and it did, so now let's train a model on 100% of the training data and see what happens.
But before we do, what do you think will happen?
If our model was able to perform well on only 10% of the data, how do you think it will go on 100% of the data?
These types of questions are good to think about in the world of machine learning. After all, that's why the machine learner's motto is experiment, experiment, experiment!
So let's follow our three steps from before:
create_model()
functionNote: Fitting our model on such a large amount of data will take a long time without a GPU. But, if you're using Google Colab, you can access a GPU via Runtime -> Change runtime type -> Hardware accelerator -> GPU. (See Part 2 again for a full walkthrough of this).
So let's get training!
Input:
# 1. Create model_1 (the next iteration of model_0)
model_1 = create_model(num_classes=len(class_names),
model_name="model_1")
# 2. Compile model
model_1.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.001),
loss="categorical_crossentropy",
metrics=["accuracy"])
# 3. Fit model
epochs=5
history_1 = model_1.fit(x=train_ds,
epochs=epochs,
validation_data=test_ds)
Output:
Epoch 1/5
375/375 [==============================] - 43s 84ms/step - loss: 1.2725 - accuracy: 0.7607 - val_loss: 0.4849 - val_accuracy: 0.8756
Epoch 2/5
375/375 [==============================] - 30s 80ms/step - loss: 0.3667 - accuracy: 0.9013 - val_loss: 0.4041 - val_accuracy: 0.8770
Epoch 3/5
375/375 [==============================] - 30s 79ms/step - loss: 0.2641 - accuracy: 0.9287 - val_loss: 0.3731 - val_accuracy: 0.8832
Epoch 4/5
375/375 [==============================] - 30s 80ms/step - loss: 0.2043 - accuracy: 0.9483 - val_loss: 0.3708 - val_accuracy: 0.8819
Epoch 5/5
375/375 [==============================] - 30s 80ms/step - loss: 0.1606 - accuracy: 0.9633 - val_loss: 0.3753 - val_accuracy: 0.8767
Woah! It looks like all that extra data helped our model quite a bit, it's now performing at close to ~90% accuracy on the test set.
The question now, of course, is how many epochs should I fit for? Well, how about we evaluate our model_1
?
Let's start by plotting loss curves first with the data contained within history_1
.
Input:
# Plot model_1 loss curves
plot_model_loss_curves(history=history_1)
Output:
Hmm, looks like our model performed well, however, the validation accuracy and loss seemed to flatten out. Whereas, the training accuracy and loss seemed to keep improving.
Once again, this is a sign of overfitting (i.e. the model is performing much better on the training set than on the validation/test set). However, since our model looks to be performing quite well I'll leave this overfitting problem as a research project for later.
For now, let's evaluate our model on the test dataset using the evaluate()
method.
Input:
# Evaluate model_1
model_1_results = model_1.evaluate(test_ds)
Output:
269/269 [==============================] - 12s 46ms/step - loss: 0.3753 - accuracy: 0.8767
Nice!
Looks like that extra data boosted our model's performance from ~80% on the test set to ~90% on the test set. (Exact numbers here may vary due to the inherited randomness in machine learning models).
Now that we've trained a model, it's time to make predictions with it.
Because that's the whole goal of machine learning. Train a model on existing data, to make predictions on new data.
Our test data is supposed to simulate new data, data our model has never seen before.
We can make predictions with the tf.keras.Model.predict()
method, passing it our test_ds
(short for test dataset) variable.
Input:
# This will output logits (as long as softmax activation isn't in the model)
test_preds = model_1.predict(test_ds)
# Note: If not using activation="softmax" in last layer of model, may need to turn them into prediction probabilities (easier to understand)
# test_preds = tf.keras.activations.softmax(tf.constant(test_preds), axis=-1)
Output:
269/269 [==============================] - 13s 44ms/step
So now let's inspect our test_preds
by first checking its shape.
Input:
Test_preds.shape
Output:
(8580, 120)
Okay, looks like our test_pred
variable contains 8580 values (one for each test sample) with 120 elements (one value for each dog class).
Let's inspect a single test prediction and see what it looks like.
Input:
# Get a "random" variable between all of the test samples
random.seed(42)
random_test_index = random.randint(0, test_preds.shape[0] - 1)
print(f"[INFO] Random test index: {random_test_index}")
# Inspect a single test prediction sample
random_test_pred_sample = test_preds[random_test_index]
print(f"[INFO] Random test pred sample shape: {random_test_pred_sample.shape}")
print(f"[INFO] Random test pred sample argmax: {tf.argmax(random_test_pred_sample)}")
print(f"[INFO] Random test pred sample label: {dog_names[tf.argmax(random_test_pred_sample)]}")
print(f"[INFO] Random test pred sample max prediction probability: {tf.reduce_max(random_test_pred_sample)}")
print(f"[INFO] Random test pred sample prediction probability values:\n{random_test_pred_sample}")
Output:
[INFO] Random test index: 1824
[INFO] Random test pred sample shape: (120,)
[INFO] Random test pred sample argmax: 24
[INFO] Random test pred sample label: brittany_spaniel
[INFO] Random test pred sample max prediction probability: 0.9248308539390564
[INFO] Random test pred sample prediction probability values:
[3.0155065e-06 4.2946940e-05 3.2878995e-06 3.1306336e-05 1.7298260e-06
1.3368123e-05 2.8498230e-06 6.8758955e-06 2.6828552e-06 4.6089318e-04
9.8374185e-06 1.9263330e-06 7.6487186e-07 6.1217276e-04 1.2198443e-06
5.9309714e-06 2.4797799e-05 2.5847612e-06 4.9912862e-05 3.1809162e-07
1.0326848e-06 2.7293386e-06 2.1035332e-06 5.2793930e-06 9.2483085e-01
2.6070888e-06 1.6410323e-06 1.4008251e-06 2.0515323e-05 2.1309786e-05
1.4602327e-06 3.8456672e-04 7.4974610e-05 4.4831428e-05 5.5091264e-06
2.1345174e-07 2.9732748e-06 5.5520386e-06 8.7954652e-07 1.6277906e-03
5.3978354e-02 9.6090174e-05 9.6672220e-06 4.4037843e-06 2.5557700e-05
6.3994042e-07 1.6738920e-06 4.6715216e-04 4.1448075e-06 6.4118845e-05
2.0398900e-06 3.6135450e-06 4.4963690e-05 2.8406910e-05 3.4689847e-07
6.2964758e-04 9.1336078e-05 5.2363583e-05 1.2731762e-06 2.4212743e-06
1.5872080e-06 6.3476455e-06 6.2880179e-07 6.6757898e-06 1.6635622e-06
4.3550008e-07 2.3698403e-05 1.4149221e-05 3.8156581e-05 1.0464001e-05
5.0107906e-06 1.7395665e-06 2.8848885e-07 4.2622072e-05 3.2712339e-07
1.8591476e-07 2.2874669e-05 7.9814470e-07 2.3121322e-05 1.6275973e-06
4.6186727e-07 7.6188849e-07 3.2468931e-06 3.1449999e-05 2.9600946e-05
3.8992380e-06 2.8564186e-06 4.1459539e-06 6.0877244e-07 2.5443229e-05
5.4467969e-06 5.4184858e-07 2.8361776e-04 9.0548929e-05 8.8840829e-07
9.1714105e-07 1.9990568e-07 1.7958368e-05 7.7042150e-06 2.4126435e-05
1.9759838e-05 8.2941342e-06 2.5857928e-05 6.1904398e-06 1.4601937e-06
1.5800337e-05 6.0928446e-06 5.0209674e-05 1.4067524e-05 2.3544631e-05
1.4134421e-06 9.8844721e-05 9.1535941e-05 2.4448002e-03 5.8540131e-06
1.2547853e-02 1.3779800e-05 8.0164841e-07 2.5093528e-05 3.7180773e-05]
Okay looks like each individual sample of our test predictions is a tensor of prediction probabilities.
What does that mean?
Well, in essence, each element is a probability between 0 and 1 as to how confident our model is whether the prediction is correct or not.
Note: Just because a model's prediction probability for a particular sample is closer to 1 on a certain class (e.g. 0.9999) that doesn't mean that it’s correct.
A prediction can have a high probability but still be incorrect, which we’ll see later on, when I add my own face into the images.
The maximum value of our prediction probabilities tensor is what the model considers the most likely prediction given the specific sample.
So, we can take the index of the maximum value (using tf.argmax
) and index on the list of dog names to get the predicted class name.
Note:
tf.argmax
or "argmax" for short gets the index of where the maximum value occurs in a tensor along a specified dimension.We can use
tf.reduce_max
to get the maximum value itself.
To make our predictions easier to compare to the test dataset, let's unbundle our test_ds
object into two separate arrays called test_ds_images
and test_ds_labels
.
We can do this by looping through the samples in our test_ds
object and appending each to a list (we'll do this with a list comprehension).
Then we can join those lists together into an array with np.concatenate
, like so:
Input:
import numpy as np
# Extract test images and labels from test_ds
test_ds_images = np.concatenate([images for images, labels in test_ds], axis=0)
test_ds_labels = np.concatenate([labels for images, labels in test_ds], axis=0)
# How many images and labels do we have?
len(test_ds_images), len(test_ds_labels)
Output:
(8580, 8580)
Perfect! Now we've got a way to compare our predictions on a given image (in test_ds_images
) to its appropriate label in test_ds_labels
.
This is one of the main reasons we didn't shuffle the test dataset, because now our predictions tensor has the same indexes as our test_ds_images
and test_ds_labels
arrays.
This means that if we chose to compare sample number 42
, everything would (or at least should) line up.
In fact, let's try just that.
Input:
# Set target index
target_index = 42 # try changing this to another value and seeing how the model performs on other samples
# Get test image
test_image = test_ds_images[target_index]
# Get truth label (index of max in test label)
test_image_truth_label = class_names[tf.argmax(test_ds_labels[target_index])]
# Get prediction probabilities
test_image_pred_probs = test_preds[target_index]
# Get index of class with highest prediction probability
test_image_pred_class = class_names[tf.argmax(test_image_pred_probs)]
# Plot the image
plt.figure(figsize=(5, 4))
plt.imshow(test_image.astype("uint8"))
# Create sample title with prediction probability value
title = f"""True: {test_image_truth_label}
Pred: {test_image_pred_class}
Prob: {np.max(test_image_pred_probs):.2f}"""
# Colour the title based on correctness of pred
plt.title(title,
color="green" if test_image_truth_label == test_image_pred_class else "red")
plt.axis("off");
Output:
Woohoo! Looks like our model got the prediction right. According to the test data, sample number 42
is in fact an Affenpinscher.
So, our model is working for sample 42
at least, but let’s check some others. In fact, let’s write some code to test a number of different samples at a time.
Alright, let's check multiple images at the same time to see if the model is correct.
Input:
# Choose a random 10 indexes from the test data and compare the values
import random
random.seed(42) # try changing the random seed or commenting it out for different values
random_indexes = random.sample(range(len(test_ds_images)), 10)
# Create a plot with multiple subplots
fig, axes = plt.subplots(2, 5, figsize=(15, 7))
# Loop through the axes of the plot
for i, ax in enumerate(axes.flatten()):
target_index = random_indexes[i] # get a random index (this is another reason we didn't shuffle the test set)
# Get relevant target image, label, prediction and prediction probabilities
test_image = test_ds_images[target_index]
test_image_truth_label = class_names[tf.argmax(test_ds_labels[target_index])]
test_image_pred_probs = test_preds[target_index]
test_image_pred_class = class_names[tf.argmax(test_image_pred_probs)]
# Plot the image
ax.imshow(test_image.astype("uint8"))
# Create sample title
title = f"""True: {test_image_truth_label}
Pred: {test_image_pred_class}
Prob: {np.max(test_image_pred_probs):.2f}"""
# Colour the title based on correctness of pred
ax.set_title(title,
color="green" if test_image_truth_label == test_image_pred_class else "red")
ax.axis("off")
Output:
Looks like our model does quite well, but some dogs don’t seem to be as accurate as others, so let’s look into that.
Our model's overall accuracy is ~90%, which is an outstanding result, but what about the accuracy per class?
As in:
boxer
class perform?australian_terrier
?If we take a look on the original Stanford Dogs Dataset website, the authors reported the accuracy per class of each of the dog breeds.
Their best-performing class was the african_hunting_dog
, which achieved close to 60% accuracy (about ~58% if I'm reading the graph correctly).
How about we try and replicate the same plot with our own results, and then we can see the accuracy of each dog, as well as compare it to the original.
So, first of all, let's create a DataFrame
with information about our test predictions and test samples.
We'll start by:
argmax
of the test predictions as well as the test labelsDataFrame
!Like so:
Input:
# Get argmax labels of test predictions and test ground truth
test_preds_labels = test_preds.argmax(axis=-1)
test_ds_labels_argmax = test_ds_labels.argmax(axis=-1)
# Get highest prediction probability of test predictions
test_pred_probs_max = tf.reduce_max(test_preds, axis=-1).numpy() # extract NumPy since pandas doesn't handle TensorFlow Tensors
# Create DataFrame of test results
test_results_df = pd.DataFrame({"test_pred_label": test_preds_labels,
"test_pred_prob": test_pred_probs_max,
"test_pred_class_name": [class_names[test_pred_label] for test_pred_label in test_preds_labels],
"test_truth_label": test_ds_labels_argmax,
"test_truth_class_name": [class_names[test_truth_label] for test_truth_label in test_ds_labels_argmax]})
# Create a column whether or not the prediction matches the label
test_results_df["correct"] = test_results_df["test_pred_class_name"] == test_results_df["test_truth_class_name"]
test_results_df.head()
Output:
Now that we have our DataFrame
we can perform some further analysis, such as getting the accuracy per class.
We can do so by grouping the test_results_df
via the "test_truth_class_name"
column and then taking the mean of the "correct"
column.
We can then create a new DataFrame
based on this view and sort the values by correctness (e.g. the classes with the highest performance should be up the top).
Input:
# Calculate accuracy per class
accuracy_per_class = test_results_df.groupby("test_truth_class_name")["correct"].mean()
# Create new DataFrame to sort classes by accuracy
accuracy_per_class_df = pd.DataFrame(accuracy_per_class).reset_index().sort_values("correct", ascending=False)
accuracy_per_class_df.head()
Output:
Woah! Looks like we've got a fair few dog classes that are 100% accurate or close to it.
That's outstanding!
Now let's recreate the horizontal bar plot used on the original Stanford Dogs research paper page.
Input:
# Let's create a horizontal bar chart to replicate a similar plot to the original Stanford Dogs page
plt.figure(figsize=(10, 17))
plt.barh(y=accuracy_per_class_df["test_truth_class_name"],
width=accuracy_per_class_df["correct"])
plt.xlabel("Accuracy")
plt.ylabel("Class Name")
plt.title("Dog Vision Accuracy per Class")
plt.ylim(-0.5, len(accuracy_per_class_df["test_truth_class_name"]) - 0.5) # Adjust y-axis limits to reduce white space
plt.gca().invert_yaxis() # This will display the first class at the top
plt.tight_layout()
plt.show()
Output:
It looks like our model performs incredibly well across the vast majority of all dog classes.
In fact, when we compare it to the original Stanford Dogs horizontal bar graph we can see that their best-performing class got close to 60% accuracy. However, it's only when we take a look at our worst-performing classes do we see a handful of classes with just under 60% accuracy.
Not bad at all!
Input:
# Inspecting our worst performing classes (note how only a couple of classes perform at ~55% accuracy or below)
accuracy_per_class_df.tail()
Output:
What an awesome result! We've now replicated and even vastly improved a Stanford research paper.
So now that we've seen how well our model performs, how about we check where it performed poorly, and try to figure out why.
A great way to inspect your models errors is to find the examples where the prediction had a high probability but the prediction was wrong.
This is often called the "most wrong" samples. The model was very confident in its prediction, but was wrong.
So, let's filter for the top 100 most wrong by sorting the incorrect predictions by the "test_pred_prob"
column.
Input:
# Get most wrong
top_100_most_wrong = test_results_df[test_results_df["correct"] == 0].sort_values("test_pred_prob", ascending=False)[:100]
top_100_most_wrong.head()
Output:
One way to inspect these most wrong predictions would be to go through the different breeds one by one and see why the model would've confused them, such as comparing miniature_pinscher
to doberman
(two quite similar-looking dog breeds).
That’s a lot of manual work, so instead, let’s get a random 10 samples and plot them to see what they look like instead.
Input:
# Get 10 random indexes of "most wrong" predictions
top_100_most_wrong.sample(n=10).index
Output:
Index([2001, 1715, 8112, 1642, 5480, 6383, 7363, 4155, 7895, 4105], dtype='int64')
How about we plot these indexes?
Input:
# Choose a random 10 indexes from the test data and compare the values
import random
random_most_wrong_indexes = top_100_most_wrong.sample(n=10).index
# Iterate through test results and plot them
# Note: This is why we don't shuffle the test data, so that it's in original order when we evaluate it.
fig, axes = plt.subplots(2, 5, figsize=(15, 7))
for i, ax in enumerate(axes.flatten()):
target_index = random_most_wrong_indexes[i]
# Get relevant target image, label, prediction and prediction probabilities
test_image = test_ds_images[target_index]
test_image_truth_label = class_names[tf.argmax(test_ds_labels[target_index])]
test_image_pred_probs = test_preds[target_index]
test_image_pred_class = class_names[tf.argmax(test_image_pred_probs)]
# Plot the image
ax.imshow(test_image.astype("uint8"))
# Create sample title
title = f"""True: {test_image_truth_label}
Pred: {test_image_pred_class}
Prob: {np.max(test_image_pred_probs):.2f}"""
# Colour the title based on correctness of pred
ax.set_title(title,
color="green" if test_image_truth_label == test_image_pred_class else "red",
fontsize=10)
ax.axis("off")
Output:
Inspecting the "most wrong" examples, it's easy to see where the model got confused. Some of these breeds look very similar - with some of them being miniature versions.
These samples can show us where we might want to collect more data or correct our data's labels.
Before that though, how about we make a confusion matrix for further evaluation?
A confusion matrix helps to visualize the performance of a classification algorithm by comparing the predicted classes to the actual classes (truth vs. predictions).
We can create one using Scikit-Learn's sklearn.metrics.confusion_matrix
by passing in our y_true
and y_pred
values.
Then we can display it using sklearn.metrics.ConfusionMatrixDisplay
.
Note: Because we have 120 different classes, running the code below to show the confusion matrix plot may take a minute or so to load , as it's quite a big plot. So be warned!
Input:
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
# Create a confusion matrix
confusion_matrix_dog_preds = confusion_matrix(y_true=test_ds_labels_argmax, # requires all labels to be in same format (e.g. not one-hot)
y_pred=test_preds_labels)
# Create a confusion matrix plot
confusion_matrix_display = ConfusionMatrixDisplay(confusion_matrix=confusion_matrix_dog_preds,
display_labels=class_names)
fig, ax = plt.subplots(figsize=(25, 25))
ax.set_title("Dog Vision Confusion Matrix")
confusion_matrix_display.plot(xticks_rotation="vertical",
cmap="Blues",
colorbar=False,
ax=ax);
Output:
Now that's one big confusion matrix!
It looks like most of the darker blue boxes are down the middle diagonal (where we'd like them to be).
But there are a few instances where the model confuses classes such as scottish_deerhound
and irish_wolfhound
.
And looking up those two breeds we can see that they look visually similar, and are actually a common source of confusion.
Honestly, if it wasn’t for the height difference, I would think this was the same dog with photos from different angles!
We've covered a lot of ground from loading data to training and evaluating a model. But what if you wanted to use that model somewhere else, such as on a website or in an application?
The first step is saving it to a file.
We can save our model using the tf.keras.Model.save()
method and then specify the filepath
as well as the save_format
parameters.
We'll use filepath="dog_vision_model.keras"
as well as save_format="keras'
to save our model to the new and versatile .keras
format.
Let's save our best performing model_1
.
Note: You may also see models being saved with the
SavedModel
format as well asHDF5
formats. However, it's recommended to use the newer.keras
format. See the TensorFlow documentation on saving and loading a model for more.
Input:
# Save the model to .keras
model_save_path = "dog_vision_model.keras"
model_1.save(filepath=model_save_path,
save_format="keras")
Output:
Model saved!
And we can check it worked by loading it back in using the tf.keras.models.load_model()
method.
Input:
# Load the model
loaded_model = tf.keras.models.load_model(filepath=model_save_path)
With the file saved, now we can evaluate our loaded_model
to make sure it performs well on the test dataset.
Input:
# Evaluate the loaded model
loaded_model_results = loaded_model.evaluate(test_ds)
Output:
269/269 [==============================] - 15s 47ms/step - loss: 0.3753 - accuracy: 0.8767
How about we check if the loaded_model_results
are the same as the model_1_results
?
Input:
assert model_1_results == loaded_model_results
Our trained model and loaded model results are the same!
We could now use our dog_vision_model.keras
file in an application to predict a dog breed based on an image.
Note: If you're using Google Colab, remember that if your Google Colab instance gets disconnected, it will delete all local files.
So if you want to keep your
dog_vision_model.keras
, be sure to download it or copy it to Google Drive.
So how about we see how our model goes on real world images. Because that's the whole goal of machine learning right? To see how your model goes in the real world?
So let’s make that happen.
More specifically, let's try our best model on images of my dogs (Bella 🐶 and Seven 7️⃣, yes, Seven is her actual name) and an extra wildcard image of me!
You can download the photos from my GitHub here.
Input:
# Download a set of custom images from GitHub and unzip them
!wget -nc https://github.com/mrdbourke/zero-to-mastery-ml/raw/master/images/dog-photos.zip
!unzip dog-photos.zip
Output:
--2024-04-26 01:43:26-- https://github.com/mrdbourke/zero-to-mastery-ml/raw/master/images/dog-photos.zip
Resolving github.com (github.com)... 140.82.113.4
Connecting to github.com (github.com)|140.82.113.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.githubusercontent.com/mrdbourke/zero-to-mastery-ml/master/images/dog-photos.zip [following]
--2024-04-26 01:43:26-- https://raw.githubusercontent.com/mrdbourke/zero-to-mastery-ml/master/images/dog-photos.zip
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1091355 (1.0M) [application/zip]
Saving to: ‘dog-photos.zip’
dog-photos.zip 100%[===================>] 1.04M --.-KB/s in 0.05s
2024-04-26 01:43:27 (21.6 MB/s) - ‘dog-photos.zip’ saved [1091355/1091355]
Archive: dog-photos.zip
inflating: dog-photo-4.jpeg
inflating: dog-photo-1.jpeg
inflating: dog-photo-2.jpeg
inflating: dog-photo-3.jpeg
We can also inspect our images in the file browser and see that they're under the name dog-photo-*.jpeg
.
How about we iterate through them and visualize each one?
Input:
# Create list of paths for custom dog images
custom_image_paths = ["dog-photo-1.jpeg",
"dog-photo-2.jpeg",
"dog-photo-3.jpeg",
"dog-photo-4.jpeg"]
# Iterate through list of dog images and plot each one
fig, axes = plt.subplots(1, 4, figsize=(15, 7))
for i, ax in enumerate(axes.flatten()):
ax.imshow(plt.imread(custom_image_paths[i]))
ax.axis("off")
ax.set_title(custom_image_paths[i])
Output:
What?
The first three photos look well and good but we can see dog-photo-4.jpeg
is a photo of me in a black hoodie pulling a blue steel face.
Why include a non dog photo?
I’ll tell you why in just a second. For now, let's use our loaded_model
to try and make a prediction on the first dog image dog-photo-1.jpeg
.
We can do so with the predict()
method.
Input:
# Try and make a prediction on the first dog image
loaded_model.predict("dog-photo-1.jpeg")
Output:
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-129-336b90293288> in <cell line: 2>()
1 # Try and make a prediction on the first dog image
----> 2 loaded_model.predict("dog-photo-1.jpeg")
/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py in error_handler(*args, **kwargs)
68 # To get the full stack trace, call:
69 # `tf.debugging.disable_traceback_filtering()`
---> 70 raise e.with_traceback(filtered_tb) from None
71 finally:
72 del filtered_tb
/usr/local/lib/python3.10/dist-packages/tensorflow/python/framework/tensor_shape.py in __getitem__(self, key)
960 else:
961 if self._v2_behavior:
--> 962 return self._dims[key]
963 else:
964 return self.dims[key]
IndexError: tuple index out of range
Oh no, we get an error:
`IndexError: tuple index out of range
Why is this?
Well, we can see that the code is trying to get the shape of our image.
However, we didn't pass an image to the predict()
method. We only passed a filepath, and our model expects inputs in the same format it was trained on - hence the issue.
So let's load our image and resize it.
We can do so with tf.keras.utils.load_img()
.
Input:
# Load the image (into PIL format)
custom_image = tf.keras.utils.load_img(
path="dog-photo-1.jpeg",
color_mode="rgb",
target_size=IMG_SIZE, # (224, 224) or (img_height, img_width)
)
type(custom_image), custom_image
Output:
(PIL.Image.Image, <PIL.Image.Image image mode=RGB size=224x224>)
Excellent, we've loaded our first custom image.
But now let's turn our image into a tensor, as our model was trained on image tensors, so it expects image tensors as input.
We can convert our image from PIL format to array format with tf.keras.utils.img_to_array()
.
Input:
# Turn the image into a tensor
custom_image_tensor = tf.keras.utils.img_to_array(custom_image)
Custom_image_tensor.shape
Output:
(224, 224, 3)
Nice! We've got an image tensor of shape (224, 224, 3)
.
So how about we make a prediction on it?
Input:
# Make a prediction on our custom image tensor
loaded_model.predict(custom_image_tensor)
Output:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-132-bd82d1e41fed> in <cell line: 2>()
1 # Make a prediction on our custom image tensor
----> 2 loaded_model.predict(custom_image_tensor)
/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py in error_handler(*args, **kwargs)
68 # To get the full stack trace, call:
69 # `tf.debugging.disable_traceback_filtering()`
---> 70 raise e.with_traceback(filtered_tb) from None
71 finally:
72 del filtered_tb
/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py in tf__predict_function(iterator)
13 try:
14 do_return = True
---> 15 retval_ = ag__.converted_call(ag__.ld(step_function), (ag__.ld(self), ag__.ld(iterator)), None, fscope)
16 except:
17 do_return = False
ValueError: in user code:
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 2440, in predict_function *
return step_function(self, iterator)
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 2425, in step_function **
outputs = model.distribute_strategy.run(run_step, args=(data,))
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 2413, in run_step **
outputs = model.predict_step(data)
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/training.py", line 2381, in predict_step
return self(x, training=False)
File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 70, in error_handler
raise e.with_traceback(filtered_tb) from None
File "/usr/local/lib/python3.10/dist-packages/keras/src/engine/input_spec.py", line 298, in assert_input_compatibility
raise ValueError(
ValueError: Input 0 of layer "model_1" is incompatible with the layer: expected shape=(None, 224, 224, 3), found shape=(32, 224, 3)
We get another error…
ValueError: Input 0 of layer "model_1" is incompatible with the layer: expected shape=(None, 224, 224, 3), found shape=(32, 224, 3)
So what went wrong?
Well, it looks like our model is expecting a batch size dimension on our input tensor also.
We can get this by either turning the input tensor into a single element array or by using tf.expand_dims(input, axis=0)
to expand the dimension of the tensor on the 0th axis.
Input:
# Option 1: Add batch dimension to custom_image_tensor
print(f"Shape of custom image tensor: {np.array([custom_image_tensor]).shape}")
print(f"Shape of custom image tensor: {tf.expand_dims(custom_image_tensor, axis=0).shape}")
Output:
Shape of custom image tensor: (1, 224, 224, 3)
Shape of custom image tensor: (1, 224, 224, 3)
Wonderful! We've now got a custom image tensor of shape (1, 224, 224, 3)
((batch_size, img_height, img_width, colour_channels)
).
Let's try and predict with it again now.
Input:
# Get prediction probabilities from our model
pred_probs = loaded_model.predict(tf.expand_dims(custom_image_tensor, axis=0))
Pred_probs
Output:
1/1 [==============================] - 2s 2s/step
array([[1.83611644e-06, 3.09535017e-06, 3.86047805e-06, 3.19048486e-05,
1.66974694e-03, 1.27542022e-04, 7.03033629e-06, 1.19856362e-04,
1.01050091e-05, 3.87266744e-04, 6.44192414e-06, 1.67636438e-06,
8.94749770e-04, 5.01931618e-06, 1.60283549e-03, 9.41093604e-05,
4.67637838e-05, 8.51367513e-05, 5.67736897e-05, 6.14693909e-06,
2.67342989e-06, 1.47549901e-04, 4.17501433e-05, 3.90995192e-05,
9.50478498e-05, 1.47656752e-02, 3.08718845e-05, 1.58209339e-04,
8.39364156e-03, 1.17800606e-03, 2.69454729e-04, 1.02170045e-04,
7.42143384e-05, 8.22680071e-04, 1.73064705e-04, 8.98789040e-06,
6.77722392e-06, 2.46034167e-03, 1.21447938e-05, 3.06540052e-04,
1.12927992e-04, 1.30907722e-06, 1.19819895e-04, 3.28008295e-03,
4.22435085e-04, 2.56334723e-04, 6.35078293e-04, 6.96951101e-05,
1.82968670e-05, 6.66733533e-02, 1.65604251e-06, 4.85742465e-04,
3.82422912e-03, 4.36909148e-04, 1.34899176e-06, 4.04351122e-05,
2.30197293e-05, 7.29483800e-05, 1.31009811e-05, 1.30437169e-04,
1.27625071e-05, 3.21804691e-06, 6.78410470e-06, 3.72191658e-03,
9.23305777e-07, 4.05427454e-06, 1.32554891e-02, 8.34832132e-01,
1.84010264e-06, 5.39118366e-04, 2.44915718e-05, 1.35658804e-04,
9.53144918e-04, 3.80869096e-05, 3.43683018e-06, 3.57066506e-06,
2.41459438e-05, 2.93612948e-06, 1.27533756e-04, 2.15716864e-05,
3.21038242e-05, 7.87725276e-06, 1.70349504e-05, 4.27997729e-05,
5.72475437e-06, 1.81680916e-05, 1.28094471e-04, 7.12008550e-05,
8.24760180e-04, 6.14038622e-03, 4.27179504e-03, 3.55221750e-03,
1.20739173e-03, 4.15856484e-04, 1.61429329e-04, 1.58363022e-04,
3.78229856e-06, 1.03004022e-05, 2.00551622e-05, 1.21213234e-04,
2.68000053e-06, 1.00253812e-04, 4.04065868e-05, 9.84299404e-05,
1.29673525e-03, 3.07669543e-05, 1.62672077e-05, 1.17529435e-05,
3.74953932e-04, 4.74653389e-05, 1.00191637e-05, 1.36496616e-04,
3.76833777e-05, 1.55215133e-02, 2.33796614e-04, 1.01105807e-05,
8.56942424e-05, 1.37508148e-04, 3.79100857e-06, 1.04301716e-05]],
dtype=float32)
It worked! Our model outputs a tensor of prediction probabilities.
We can find the predicted label by taking the argmax
of the pred_probs
tensor. And we get the predicted class name by indexing on the class_names
list using the predicted label.
Input:
# Get the predicted class label
pred_label = tf.argmax(pred_probs, axis=-1).numpy()[0]
# Get the predicted class name
pred_class_name = class_names[pred_label]
print(f"Predicted class label: {pred_label}")
print(f"Predicted class name: {pred_class_name}")
Output:
Predicted class label: 67
Predicted class name: labrador_retriever
It’s looking good and the errors are all gone.
How?
Simply because our model wants to make predictions on data in the same shape and format it was trained on.
So if you trained a model on image tensors with a certain shape and datatype, your model will want to make predictions on the same kind of image tensors with the same shape and datatype.
Now that it’s all set up correctly, how about we try and make predictions on multiple images?
To do so, let's make a function that replicates the workflow from above.
Input:
def pred_on_custom_image(image_path: str, # Path to the image file
model, # Trained TensorFlow model for prediction
target_size: tuple[int, int] = (224, 224), # Desired size of the image for input to the model
class_names: list = None, # List of class names (optional for plotting)
plot: bool = True): # Whether to plot the image and predicted class
"""
Loads an image, preprocesses it, makes a prediction using a provided model,
and optionally plots the image with the predicted class.
Args:
image_path (str): Path to the image file.
model: Trained TensorFlow model for prediction.
target_size (int, optional): Desired size of the image for input to the model. Defaults to 224.
class_names (list, optional): List of class names for plotting. Defaults to None.
plot (bool, optional): Whether to plot the image and predicted class. Defaults to True.
Returns:
str: The predicted class.
"""
# Prepare and load image
custom_image = tf.keras.utils.load_img(
path=image_path,
color_mode="rgb",
target_size=target_size,
)
# Turn the image into a tensor
custom_image_tensor = tf.keras.utils.img_to_array(custom_image)
# Add a batch dimension to the target tensor (e.g. (224, 224, 3) -> (1, 224, 224, 3))
custom_image_tensor = tf.expand_dims(custom_image_tensor, axis=0)
# Make a prediction with the target model
pred_probs = model.predict(custom_image_tensor)
# pred_probs = tf.keras.activations.softmax(tf.constant(pred_probs))
pred_class = class_names[tf.argmax(pred_probs, axis=-1).numpy()[0]]
# Plot if we want
if not plot:
return pred_class, pred_probs
else:
plt.figure(figsize=(5, 3))
plt.imshow(plt.imread(image_path))
plt.title(f"pred: {pred_class}\nprob: {tf.reduce_max(pred_probs):.3f}")
plt.axis("off")
What a good-looking function!
So now let's try it out on dog-photo-2.jpeg
.
Input:
# Make prediction on custom dog photo 2
pred_on_custom_image(image_path="dog-photo-2.jpeg",
model=loaded_model,
class_names=class_names)
Output:
1/1 [==============================] - 0s 27ms/step
Woohoo!!! Our model got it right!
Let's repeat the process for our other custom images.
Input:
# Predict on multiple images
fig, axes = plt.subplots(1, 4, figsize=(15, 7))
for i, ax in enumerate(axes.flatten()):
image_path = custom_image_paths[i]
pred_class, pred_probs = pred_on_custom_image(image_path=image_path,
model=loaded_model,
class_names=class_names,
plot=False)
ax.imshow(plt.imread(image_path))
ax.set_title(f"pred: {pred_class}\nprob: {tf.reduce_max(pred_probs):.3f}")
ax.axis("off");
Output:
1/1 [==============================] - 0s 28ms/step
1/1 [==============================] - 0s 26ms/step
1/1 [==============================] - 0s 25ms/step
1/1 [==============================] - 0s 28ms/step
Epic!
Our Dog Vision 🐶👁 model has come to life.
It looks like our model got it right for 3 of our 4 custom dog photos (my dogs Bella and Seven are labrador retrievers, with a potential mix of something else).
But the model seemed to also think the photo of me was a Boston bulldog!
Note: Due to the randomness of machine learning, your result may be different here. If so, please let me know, I'd love to see what other kinds of dogs the model thinks I am 😆.
You might be wondering, why does our model do this? Why does it think I’m a dog?
It's because our model has been strictly trained to always predict a dog breed, no matter what image it receives. So no matter what image we pass to our model, it will always try to predict a dog from the image.
One solution would be to create a filter system.
Is it a dog? If no then skip. If yes, then what type of dog?
For example
For my food prediction app Nutrify, I combined multiple machine learning models to create a workflow.
One model is set up for detecting food (Food Not Food), and another model is set up for identifying what food is in the image (FoodVision, similar to Dog Vision).
This creates a much better user experience if an app is customer facing.
In my Nutrify app for example, taking photos of objects that aren't food and having them identified as food can be a poor customer experience. So it filters them first and doesn’t allow non food items to be added.
These are some of the workflows you'll have to think about when you eventually deploy your own machine-learning models.
Because although machine learning models are often very powerful, they aren't perfect. This is why implementing guidelines and checks around them is still a very active area of research.
Some final thoughts to end this project with.
In any machine learning problem, getting a dataset and preparing it so that it is in a usable format will likely be the first and often most important step. Hence why we spent so much time getting the data ready in Part 1 of this series.
It will also always be an ongoing process, as although we've worked with thousands of dog images, our models could still be improved. As we saw going from training with 10% of the data to 100% of the data, one of the best ways to improve a model is with more data.
Also, explore your data early and often.
For most new problems, you should generally look to see if a pre-trained model exists and see if you can adapt it to your use case.
Ask yourself:
TensorFlow and Keras provide building blocks for neural networks which are powerful machine learning models capable of learning patterns in a wide range of data from text to audio to images and more.
Make sure to take advantage of them when you can.
It's highly unlikely you'll ever get the best-performing model on your first try, and this is ok. Machine learning is very experimental by nature.
It’s the scientific method of finding out all the ways things don’t work, so that we can find the method that does. We just use ML to help us find this out faster.
So always keep this front of mind in any machine learning project. Y
our results are never stationary and can often always be improved. This includes experimenting on the data, the model, the training setup and the outputs (how does your model work in practice?).
So that concludes our neural network, deep learning, and transfer learning project!
Great work on getting this far - especially if you followed along and built your own project for your portfolio. It’s always important you don’t just read these guides, but put this information into action so that you can better learn from it.
As for what’s next?
Well, as I mentioned up top, technically this 'mini series' is part of my larger 'Introduction to Machine Learning' series. (I just went so deep on this particular section that I needed to make it into 3 parts).
There is one more part to this overall series coming soon, that's incredibly relevant to every project, and that’s how to communicate and share your work as a Machine Learning Engineer / Data Scientist.
Be sure to subscribe via the link below so you don’t miss it.
If you want to deep dive into Machine Learning and learn how to use these tools even further, then check out my complete Machine Learning and Data Science course or watch the first few videos for free.
It’s one of the most popular, highly rated Machine Learning and Data Science bootcamps online, as well as the most modern and up-to-date. Guaranteed.
You'll go from a complete beginner with no prior experience to getting hired as a Machine Learning Engineer this year, so it’s helpful for ML Engineers of all experience levels.
Or, if you already have a good grasp of Machine Learning, and just want to focus on Tensorflow for Deep Learning, I have a course on that also that you can check out here.
When you join as a Zero To Mastery Academy member, you’ll have access to both of these courses, as well as every other course in our training library!
Not only that, but you will also be able to ask me questions, as well as chat to other students and machine learning professionals via our private Discord community.
So go ahead and check those out, and don’t forget to subscribe below so you don’t miss the final part on this larger series on Machine Learning!