Welcome to part 2 in my 7-part series on Machine Learning and Data Science. Be sure to check out the other parts in the series, as they all lead into each other.

- Part 1: A 6-Step Framework To Tackle Machine Learning Projects (Full Pipeline)
- Part 2: Introduction to NumPy (Which is what you're reading right now)
- Part 3: Introduction to Pandas + Python
- Part 4: Beginners Guide To Matplotlib (With Code Examples)
- Part 5: Introduction to Scikit-Learn
- Part 6: Introduction to Deep Learning with TensorFlow
- Part 7: Communicating & Sharing Your Work as a Machine Learning Engineer / Data Scientist

NumPy stands for Numerical Python and is the backbone of all kinds of scientific and numerical computing in Python.

And because Machine Learning is all about **turning data into numbers and then figuring out the patterns**, NumPy often comes into play.

In this tutorial, we’re going to take a look at numerical data manipulation using NumPy, and focus on the main concepts of NumPy and the `ndarray`

datatype.

(You can think of the `ndarray`

datatype as a very flexible array of numbers).

More specifically, we'll look at:

- NumPy datatypes & attributes
- Creating arrays
- Viewing arrays & matrices (indexing)
- Manipulating & comparing arrays
- Sorting arrays
- Use cases (examples of turning things into numbers)

After going through this, you'll have the base knowledge of NumPy you need to keep moving forward, so let’s get started.

Sidenote:Even though we’re giving a broad overview of NumPy to just get you started, the topics in this post can be a little difficult to comprehend - especially if you’re just starting out.If you want to deep dive into NumPy and learn Machine Learning from scratch, then check out my complete Machine Learning and Data Science course, or watch the first few videos for free.

It’s one of the most popular, highly rated machine learning and data science bootcamps online, as well as the most modern and up-to-date. Guaranteed.

You'll go from a complete beginner with no prior experience to getting hired as a Machine Learning Engineer this year, so it’s helpful for ML Engineers of all experience levels.

Want a sample of the course?Well check out the video below!:

My name is Daniel Bourke, and I'm the resident Machine Learning instructor here at Zero To Mastery.

Originally self-taught, I worked for one of Australia's fastest-growing artificial intelligence agencies, Max Kelsen, and have worked on Machine Learning and data problems across a wide range of industries including healthcare, eCommerce, finance, retail, and more.

I'm also the author of Machine Learning Monthly, write my own blog on my experiments in ML, and run my own YouTube channel - which has hit over 7.8 Million views.

Phew!

With all that out of the way, let’s get back into this introduction NumPy, why it's important, and how the main features work (with code examples).

Let's go…

It’s possible to do numerical calculations using pure Python, but it has its weaknesses. Python starts off pretty fast but once your data gets large, you'll start to notice it slows down considerably.

NumPy doesn’t have these issues, thanks to how it's been built.

Behind the scenes, the NumPy code has been optimized to run using C - a low-level machine language - which can do things much faster than Python.

Better still? The benefit of this being behind the scenes is **you don't need to know any C to take advantage of it!** You can write your numerical computations in Python using NumPy and get the added speed benefits.

Sidenote:If you are curious as to what causes this speed benefit, it's a process called vectorization that aims to do calculations by avoiding loops as loops can create potential bottlenecks. NumPy achieves this vectorization through a process called broadcasting, which we’ll touch on later.

To get started using NumPy, the first step is to import it.

The most common way (and the method you should use) is to import NumPy as the abbreviation `np`

, in Python.

```
import numpy as np
# Check the version
print(np.__version__)
```

Simple!

It’s worth noting that if you see the letters `np`

used anywhere in machine learning or data science, it's probably referring to the NumPy library.

Now that you have it installed, let’s look at some of the features and aspects of NumPy.

Remember that the main type in NumPy is `ndarray`

. This means that even seemingly different kinds of arrays are still `ndarray`

's.

Also, an operation you do on one array will work on another.

With that out of the way, let’s take a look at these.

**Input**

```
# 1-dimensonal array, also referred to as a vector
a1 = np.array([1, 2, 3])
# 2-dimensional array, also referred to as matrix
a2 = np.array([[1, 2.0, 3.3],
[4, 5, 6.5]])
# 3-dimensional array, also referred to as a matrix
a3 = np.array([[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]],
[[10, 11, 12],
[13, 14, 15],
[16, 17, 18]]])
```

**Input**

`a1.shape, a1.ndim, a1.dtype, a1.size, type(a1)`

**Output**

`((3,), 1, dtype('int64'), 3, numpy.ndarray)`

**Input**

`a2.shape, a2.ndim, a2.dtype, a2.size, type(a2)`

**Output**

`((2, 3), 2, dtype('float64'), 6, numpy.ndarray)`

**Input**

`a3.shape, a3.ndim, a3.dtype, a3.size, type(a3)`

**Output**

`((2, 3, 3), 3, dtype('int64'), 18, numpy.ndarray)`

**Input**

`a1`

**Output**

`array([1, 2, 3])`

**Input**

`a2`

**Output**

```
array([[1. , 2. , 3.3],
[4. , 5. , 6.5]])
```

**Input**

`a3`

**Output**

```
array([[[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9]],
[[10, 11, 12],
[13, 14, 15],
[16, 17, 18]]])
```

**Key terms:**

**Array**- A list of numbers, and can be multi-dimensional**Scalar**- A single number (e.g.`7`

)**Vector**- A list of numbers with 1-dimension (e.g.`np.array([1, 2, 3])`

)**Matrix**- A (usually) multi-dimensional list of numbers (e.g.`np.array([[1, 2, 3], [4, 5, 6]])`

)

Here you can see how NumPy is the backbone of many other libraries. In this example, we're importing Panda, a data analysis library for Python.

**Input**

```
import pandas as pd
df = pd.DataFrame(np.random.randint(10, size=(5, 3)),
columns=['a', 'b', 'c'])
df
```

**Output**

**Input**

`a2`

**Output**

```
array([[1. , 2. , 3.3],
[4. , 5. , 6.5]])
```

**Input**

```
df2 = pd.DataFrame(a2)
df2
```

**Output**

Here you can see the common array commands in Numpy:

`np.array()`

`np.ones()`

`np.zeros()`

`np.random.rand(5, 3)`

`np.random.randint(10, size=5)`

`np.random.seed()`

- pseudo random numbers- Searching the documentation example (finding
`np.unique()`

and using it)

Fairly simple. Let’s walk through what they look like, along with common outputs.

**Input**

```
# Create a simple array
simple_array = np.array([1, 2, 3])
simple_array
```

**Output**

`array([1, 2, 3])`

**Input**

```
simple_array = np.array((1, 2, 3))
simple_array, simple_array.dtype
```

**Output**

`(array([1, 2, 3]), dtype('int64'))`

**Input**

```
# Create an array of ones
ones = np.ones((10, 2))
ones
```

**Output**

```
array([[1., 1.],
[1., 1.],
[1., 1.],
[1., 1.],
[1., 1.],
[1., 1.],
[1., 1.],
[1., 1.],
[1., 1.],
[1., 1.]])
```

**Input**

```
# The default datatype is 'float64'
ones.dtype
```

**Output**

`dtype('float64')`

**Input**

```
# You can change the datatype with .astype()
ones.astype(int)
```

**Output**

```
array([[1, 1],
[1, 1],
[1, 1],
[1, 1],
[1, 1],
[1, 1],
[1, 1],
[1, 1],
[1, 1],
[1, 1]])
```

**Input**

```
# Create an array of zeros
zeros = np.zeros((5, 3, 3))
zeros
```

**Output**

```
array([[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]],
[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]],
[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]],
[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]],
[[0., 0., 0.],
[0., 0., 0.],
[0., 0., 0.]]])
```

**Input**

`zeros.dtype`

**Output**

`dtype('float64')`

**Input**

```
# Create an array within a range of values
range_array = np.arange(0, 10, 2)
range_array
```

**Output**

`array([0, 2, 4, 6, 8])`

**Input**

```
# Random array
random_array = np.random.randint(10, size=(5, 3))
random_array
```

**Output**

```
array([[1, 7, 2],
[7, 0, 2],
[8, 8, 8],
[2, 5, 2],
[4, 8, 6]])
```

**Input**

```
# Random array of floats (between 0 & 1)
np.random.random((5, 3))
```

**Output**

```
array([[0.09607892, 0.034903 , 0.47743753],
[0.51703027, 0.90409121, 0.54436342],
[0.8095754 , 0.60294712, 0.71141937],
[0.50802295, 0.57255717, 0.99090604],
[0.66225284, 0.87588103, 0.25643785]])
```

**Input**

`np.random.random((5, 3))`

**Output**

```
array([[0.42800066, 0.76816054, 0.14858447],
[0.48390262, 0.3708042 , 0.231316 ],
[0.29166801, 0.64327528, 0.18039386],
[0.89010443, 0.51218751, 0.31543512],
[0.38781697, 0.25729731, 0.66219967]])
```

**Input**

```
# Random 5x3 array of floats (between 0 & 1), similar to above
np.random.rand(5, 3)
```

**Output**

```
array([[0.28373526, 0.10074198, 0.24643463],
[0.8268303 , 0.48672847, 0.57633359],
[0.77867161, 0.38490598, 0.53343872],
[0.67396616, 0.15888354, 0.47710898],
[0.92319417, 0.19133444, 0.51837588]])
```

**Input**

`np.random.rand(5, 3)`

**Output**

```
array([[0.73585424, 0.83359732, 0.93900774],
[0.27563836, 0.55971665, 0.26819222],
[0.29253202, 0.64152402, 0.90479721],
[0.6585366 , 0.36165565, 0.37515932],
[0.82890572, 0.54502359, 0.48398256]])
```

`np.random.seed()`

NumPy uses pseudo-random numbers, which means the numbers look random but aren't really, they're predetermined.

For consistency, you might want to keep the random numbers you generate similar throughout experiments.

To do this, you can use `np.random.seed()`

.

What this does is it tells NumPy, "*Hey, I want you to create random numbers but keep them aligned with the seed*."

Let's see it.

**Input**

```
# Set random seed to 0
np.random.seed(0)
# Make 'random' numbers
np.random.randint(10, size=(5, 3))
```

**Output**

```
array([[5, 0, 3],
[3, 7, 9],
[3, 5, 2],
[4, 7, 6],
[8, 8, 1]])
```

With `np.random.seed()`

set, every time you run the cell above, the same random numbers will be generated, which is awesome.

But what if `np.random.seed()`

wasn't set? Well, every time you run the cell below, a new set of numbers will appear.

**Input**

```
# Make more random numbers
np.random.randint(10, size=(5, 3))
```

**Output**

```
array([[6, 7, 7],
[8, 1, 5],
[9, 8, 9],
[4, 3, 0],
[3, 5, 0]])
```

Let's see it in action again. This time we'll stay consistent and set the random seed to 0.

**Input**

```
# Set random seed to same number as above
np.random.seed(0)
# The same random numbers come out
np.random.randint(10, size=(5, 3))
```

**Output**

```
array([[5, 0, 3],
[3, 7, 9],
[3, 5, 2],
[4, 7, 6],
[8, 8, 1]])
```

**So what's happening here?**

Well, because `np.random.seed()`

is set to 0, the random numbers are the same as the cell with `np.random.seed()`

set to 0 as well. (Setting `np.random.seed()`

is not 100% necessary but it's helpful to keep numbers the same throughout your experiments).

**For example**

Let’s say that you wanted to split your data randomly into training and test sets. Every time you randomly split, you might get different rows in each set.

Likewise, if you shared your work with someone else, they'd get different rows in each set too.

So, setting `np.random.seed()`

ensures there's still randomness, but it just makes the randomness repeatable, hence the 'pseudo-random' numbers.

**Input**

```
np.random.seed(0)
df = pd.DataFrame(np.random.randint(10, size=(5, 3)))
df
```

**Output**

Remember, because arrays and matrices are both `ndarray`

's, they can be viewed in similar ways.

With that in mind, let's check out our 3 arrays again:

**Input**

`a1`

**Output**

`array([1, 2, 3])`

**Input**

`a2`

**Output**

```
array([[1. , 2. , 3.3],
[4. , 5. , 6.5]])
```

**Input**

`a3`

**Output**

```
array([[[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9]],
[[10, 11, 12],
[13, 14, 15],
[16, 17, 18]]])
```

Array shapes are always listed in the format `(row, column, n, n, n...)`

where `n`

is optional extra dimensions.

**Input**

`a1[0]`

**Output**

`1`

**Input**

`a2[0]`

**Output**

`array([1. , 2. , 3.3])`

**Input**

`a3[0]`

**Output**

```
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
```

**Input**

```
# Get 2nd row (index 1) of a2
a2[1]
```

**Output**

`array([4. , 5. , 6.5])`

**Input**

```
# Get the first 2 values of the first 2 rows of both arrays
a3[:2, :2, :2]
```

**Output**

```
array([[[ 1, 2],
[ 4, 5]],
[[10, 11],
[13, 14]]])
```

This takes a bit of practice, especially when the dimensions get higher. Personally, it usually takes me a little trial and error to try to get certain values, view the output in the notebook, and then try again.

Also, NumPy arrays get printed from outside to inside. This means the number at the end of the shape comes first, and the number at the start of the shape comes last.

**Input**

```
a4 = np.random.randint(10, size=(2, 3, 4, 5))
a4
```

**Output**

```
array([[[[6, 7, 7, 8, 1],
[5, 9, 8, 9, 4],
[3, 0, 3, 5, 0],
[2, 3, 8, 1, 3]],
[[3, 3, 7, 0, 1],
[9, 9, 0, 4, 7],
[3, 2, 7, 2, 0],
[0, 4, 5, 5, 6]],
[[8, 4, 1, 4, 9],
[8, 1, 1, 7, 9],
[9, 3, 6, 7, 2],
[0, 3, 5, 9, 4]]],
[[[4, 6, 4, 4, 3],
[4, 4, 8, 4, 3],
[7, 5, 5, 0, 1],
[5, 9, 3, 0, 5]],
[[0, 1, 2, 4, 2],
[0, 3, 2, 0, 7],
[5, 9, 0, 2, 7],
[2, 9, 2, 3, 3]],
[[2, 3, 4, 1, 2],
[9, 1, 4, 6, 8],
[2, 3, 0, 0, 6],
[0, 6, 3, 3, 8]]]])
```

**Input**

`a4.shape`

**Output**

`(2, 3, 4, 5)`

**Input**

```
# Get only the first 4 numbers of each single vector
a4[:, :, :, :4]
```

**Output**

```
array([[[[6, 7, 7, 8],
[5, 9, 8, 9],
[3, 0, 3, 5],
[2, 3, 8, 1]],
[[3, 3, 7, 0],
[9, 9, 0, 4],
[3, 2, 7, 2],
[0, 4, 5, 5]],
[[8, 4, 1, 4],
[8, 1, 1, 7],
[9, 3, 6, 7],
[0, 3, 5, 9]]],
[[[4, 6, 4, 4],
[4, 4, 8, 4],
[7, 5, 5, 0],
[5, 9, 3, 0]],
[[0, 1, 2, 4],
[0, 3, 2, 0],
[5, 9, 0, 2],
[2, 9, 2, 3]],
[[2, 3, 4, 1],
[9, 1, 4, 6],
[2, 3, 0, 0],
[0, 6, 3, 3]]]])
```

**For example**

`a4`

's shape is (2, 3, 4, 5), which means it gets displayed like so:

- Inner most array = size 5
- Next array = size 4
- Next array = size 3
- Outer most array = size 2

Here’s a list of the common commands when manipulating arrays, as well as examples:

- Arithmetic
`+`

,`-`

,`*`

,`/`

,`//`

,`**`

, and`%`

`np.exp()`

`np.log()`

- Dot product -
`np.dot()`

- Broadcasting

- Aggregation
`np.sum()`

- faster than Python's`.sum()`

for NumPy arrays`np.mean()`

`np.std()`

`np.var()`

`np.min()`

`np.max()`

`np.argmin()`

- find index of minimum value`np.argmax()`

- find index of maximum value- These work on all
`ndarray`

's`a4.min(axis=0)`

-- you can use axis as well

- Reshaping
`np.reshape()`

- Transposing
`a3.T`

- Comparison operators
`>`

`<`

`<=`

`>=`

`x != 3`

`x == 3`

`np.sum(x > 3)`

**Input**

`a1`

**Output**

`array([1, 2, 3])`

**Input**

```
ones = np.ones(3)
ones
```

**Output**

`array([1., 1., 1.])`

**Input**

```
# Add two arrays
a1 + ones
```

**Output**

`array([2., 3., 4.])`

**Input**

```
# Subtract two arrays
a1 - ones
```

**Output**

`array([0., 1., 2.])`

**Input**

```
# Multiply two arrays
a1 * ones
```

**Output**

`array([1., 2., 3.])`

**Input**

```
# Multiply two arrays
a1 * a2
```

**Output**

```
array([[ 1. , 4. , 9.9],
[ 4. , 10. , 19.5]])
```

**Input**

`a1.shape, a2.shape`

**Output**

`((3,), (2, 3))`

**Input**

```
# This will error as the arrays have a different number of dimensions (2, 3) vs. (2, 3, 3)
a2 * a3
```

**Output**

```
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[49], line 2
1 # This will error as the arrays have a different number of dimensions (2, 3) vs. (2, 3, 3)
----> 2 a2 * a3
ValueError: operands could not be broadcast together with shapes (2,3) (2,3,3)
```

**Input**

`a3`

**Output**

```
array([[[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9]],
[[10, 11, 12],
[13, 14, 15],
[16, 17, 18]]])
```

Broadcasting is a feature of NumPy that performs an operation across multiple dimensions of data, without replicating the data. This saves both time and space.

**For example**

If you have a 3x3 array (A) and want to add a 1x3 array (B), NumPy will add the row of (B) to every row of (A).

- If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading (left) side
- If the shape of the two arrays does not match in any dimension, the array with a shape equal to 1 in that dimension is stretched to match the other shape
- If in any dimension the sizes disagree and neither is equal to 1, an error is raised

Also, in order to broadcast, the size of the trailing axes for both arrays in an operation must be either the same size or one of them must be one.

**Input**

`a1`

**Output**

```
array([1, 2, 3])
```

**Input**

`a1.shape`

**Output**

`(3,)`

**Input**

`a2.shape`

**Output**

`(2, 3)`

**Input**

`a2`

**Output**

```
array([[1. , 2. , 3.3],
[4. , 5. , 6.5]])
```

**Input**

`a1 + a2`

**Output**

```
array([[2. , 4. , 6.3],
[5. , 7. , 9.5]])
```

**Input**

`a2 + 2`

**Output**

```
array([[3. , 4. , 5.3],
[6. , 7. , 8.5]])
```

**Input**

```
# Raises an error because there's a shape mismatch (2, 3) vs. (2, 3, 3)
a2 + a3
```

**Output**

```
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[57], line 2
1 # Raises an error because there's a shape mismatch (2, 3) vs. (2, 3, 3)
----> 2 a2 + a3
ValueError: operands could not be broadcast together with shapes (2,3) (2,3,3)
```

**Input**

```
# Divide two arrays
a1 / ones
```

**Output**

`array([1., 2., 3.])`

**Input**

```
# Divide using floor division
a2 // a1
```

**Output**

```
array([[1., 1., 1.],
[4., 2., 2.]])
```

**Input**

```
# Take an array to a power
a1 ** 2
```

**Output**

`array([1, 4, 9])`

**Input**

```
# You can also use np.square()
np.square(a1)
```

**Output**

`array([1, 4, 9])`

**Input**

```
# Modulus divide (what's the remainder)
a1 % 2
```

**Output**

`array([1, 0, 1])`

You can also find the log or exponential of an array using `np.log()`

and `np.exp()`

.

**Input**

```
# Find the log of an array
np.log(a1)
```

**Output**

```
array([0. , 0.69314718, 1.09861229])
```

**Input**

```
# Find the exponential of an array
np.exp(a1)
```

**Output**

`array([ 2.71828183, 7.3890561, 20.08553692])`

**Input**

**Output**

Aggregation - bringing things together, doing a similar thing on a number of things.

**Input**

`sum(a1)`

**Output**

`6`

**Input**

`np.sum(a1)`

**Output**

`6`

**Tip:** Use NumPy's `np.sum()`

on NumPy arrays and Python's `sum()`

on Python `list`

s.

**Input**

```
massive_array = np.random.random(100000)
massive_array.size, type(massive_array)
```

**Output**

`(100000, numpy.ndarray)`

**Input**

```
%timeit sum(massive_array) # Python sum()
%timeit np.sum(massive_array) # NumPy np.sum()
```

**Output**

```
4.38 ms ± 119 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
20.3 µs ± 110 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
```

Notice `np.sum()`

is faster on the Numpy array (`numpy.ndarray`

) than Python's `sum()`

.

Now let's try it out on a Python list.

**Input**

```
import random
massive_list = [random.randint(0, 10) for i in range(100000)]
len(massive_list), type(massive_list)
```

**Output**

`(100000, list)`

**Input**

`massive_list[:10]`

**Output**

`[0, 4, 5, 9, 7, 0, 1, 7, 8, 1]`

**Input**

```
%timeit sum(massive_list)
%timeit np.sum(massive_list)
```

**Output**

```
598 µs ± 959 ns per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
2.72 ms ± 10.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
```

Here, NumPy's `np.sum()`

is still fast but Python's `sum()`

is faster on Python `list`

s.

**Input**

`a2`

**Output**

```
array([[1. , 2. , 3.3],
[4. , 5. , 6.5]])
```

**Input**

```
# Find the mean
np.mean(a2)
```

**Output**

`3.6333333333333333`

**Input**

```
# Find the max
np.max(a2)
```

**Output**

`6.5`

**Input**

```
# Find the min
np.min(a2)
```

**Output**

`1.0`

**Input**

```
# Find the standard deviation
np.std(a2)
```

**Output**

`1.8226964152656422`

**Input**

```
# Find the variance
np.var(a2)
```

**Output**

`3.3222222222222224`

**Input**

```
# The standard deviation is the square root of the variance
np.sqrt(np.var(a2))
```

**Output**

```
1.8226964152656422
```

Mean is the same as average. You can find the average of a set of numbers by adding them up and dividing them by how many there are.

Standard deviation is a measure of how spread out numbers are.

The variance is the average squared differences of the mean.

To work it out, you:

- Work out the mean
- For each number, subtract the mean and square the result
- Find the average of the squared differences

**Input**

```
# Demo of variance
high_var_array = np.array([1, 100, 200, 300, 4000, 5000])
low_var_array = np.array([2, 4, 6, 8, 10])
np.var(high_var_array), np.var(low_var_array)
```

**Output**

`(4296133.472222221, 8.0)`

**Input**

`np.std(high_var_array), np.std(low_var_array)`

**Output**

`(2072.711623024829, 2.8284271247461903)`

**Input**

```
# The standard deviation is the square root of the variance
np.sqrt(np.var(high_var_array))
```

**Output**

`2072.711623024829`

**Input**

```
%matplotlib inline
import matplotlib.pyplot as plt
plt.hist(high_var_array)
plt.show()
```

**Output**

**Input**

```
plt.hist(low_var_array)
plt.show()
```

**Output**

**Input**

`a2`

**Output**

```
array([[1. , 2. , 3.3],
[4. , 5. , 6.5]])
```

**Input**

`a2.shape`

**Output**

`(2, 3)`

**Input**

`a2 + a3`

**Output**

```
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[86], line 1
----> 1 a2 + a3
ValueError: operands could not be broadcast together with shapes (2,3) (2,3,3)
```

**Input**

`a2.reshape(2, 3, 1)`

**Input**

`a2.reshape(2, 3, 1) + a3`

A transpose reverses the order of the axes.

**For example**

An array with shape `(2, 3)`

becomes `(3, 2)`

.

**Input**

`a2.shape`

**Input**

`a2.T`

**Input**

`a2.transpose()`

**Input**

`a2.T.shape`

For larger arrays, the default value of a transpose is to swap the first and last axes.

**For example**

`(5, 3, 3)`

-> `(3, 3, 5)`

.

**Input**

```
matrix = np.random.random(size=(5, 3, 3))
matrix
```

**Input**

`matrix.shape`

**Input**

`matrix.T`

**Input**

`matrix.T.shape`

**Input**

```
# Check to see if the reverse shape is same as transpose shape
matrix.T.shape == matrix.shape[::-1]
```

**Input**

```
# Check to see if the first and last axes are swapped
matrix.T == matrix.swapaxes(0, -1) # swap first (0) and last (-1) axes
```

Sidenote:You can see more advanced forms of transposing in the NumPy documentation under`numpy.transpose`

.

The two main rules to remember for dot product:

- The
**inner dimensions**must match:

`(3, 2) @ (3, 2)`

won't work`(2, 3) @ (3, 2)`

will work`(3, 2) @ (2, 3)`

will work

- The resulting matrix has the shape of the
**outer dimensions**:

`(2, 3) @ (3, 2)`

->`(2, 2)`

`(3, 2) @ (2, 3)`

->`(3, 3)`

Important: In NumPy,`np.dot()`

and`@`

can be used to achieve the same result for 1-2 dimension arrays. However, their behavior begins to differ in arrays with 3+ dimensions.

**Input**

```
np.random.seed(0)
mat1 = np.random.randint(10, size=(3, 3))
mat2 = np.random.randint(10, size=(3, 2))
mat1.shape, mat2.shape
```

**Input**

`mat1`

**Input**

`mat2`

**Input**

`np.dot(mat1, mat2)`

**Input**

```
# Can also achieve np.dot() with "@"
# (however, they may behave differently at 3D+ arrays)
mat1 @ mat2
```

**Input**

```
np.random.seed(0)
mat3 = np.random.randint(10, size=(4,3))
mat4 = np.random.randint(10, size=(4,3))
mat3
```

**Input**

`mat4`

**Input**

```
# This will fail as the inner dimensions of the matrices do not match
np.dot(mat3, mat4)
```

**Input**

`mat3.T.shape`

**Input**

```
# Dot product
np.dot(mat3.T, mat4)
```

**Input**

```
# Element-wise multiplication, also known as Hadamard product
mat3 * mat4
```

So let’s look at an example of this in practice, with multiple types of nut butter sales (Almond, Peanut, and Cashew).

**Input**

```
np.random.seed(0)
sales_amounts = np.random.randint(20, size=(5, 3))
sales_amounts
```

**Input**

```
weekly_sales = pd.DataFrame(sales_amounts,
index=["Mon", "Tues", "Wed", "Thurs", "Fri"],
columns=["Almond butter", "Peanut butter", "Cashew butter"])
weekly_sales
```

**Input**

```
prices = np.array([10, 8, 12])
prices
```

**Output**

`array([10, 8, 12])`

**Input**

```
butter_prices = pd.DataFrame(prices.reshape(1, 3),
index=["Price"],
columns=["Almond butter", "Peanut butter", "Cashew butter"])
butter_prices.shape
```

**Output**

```
(1, 3)
```

**Input**

`weekly_sales.shape`

**Output**

```
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[89], line 1
----> 1 weekly_sales.shape
NameError: name 'weekly_sales' is not defined
```

**Input**

```
# Find the total amount of sales for a whole day
total_sales = prices.dot(sales_amounts)
total_sales
```

**Output**

```
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[90], line 2
1 # Find the total amount of sales for a whole day
----> 2 total_sales = prices.dot(sales_amounts)
3 total_sales
NameError: name 'sales_amounts' is not defined
```

The shapes aren't aligned, as we need the middle two numbers to be the same.
**Input**

`prices`

**Output**

`array([10, 8, 12])`

**Input**

`sales_amounts.T.shape`

**Output**

```
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[92], line 1
----> 1 sales_amounts.T.shape
NameError: name 'sales_amounts' is not defined
```

**Input**

```
# To make the middle numbers the same, we can transpose
total_sales = prices.dot(sales_amounts.T)
total_sales
```

**Output**

```
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[93], line 2
1 # To make the middle numbers the same, we can transpose
----> 2 total_sales = prices.dot(sales_amounts.T)
3 total_sales
NameError: name 'sales_amounts' is not defined
```

**Input**

`butter_prices.shape, weekly_sales.shape`

**Output**

```
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[94], line 1
----> 1 butter_prices.shape, weekly_sales.shape
NameError: name 'weekly_sales' is not defined
```

**Input**

```
daily_sales = butter_prices.dot(weekly_sales.T)
daily_sales
```

**Output**

```
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[95], line 1
----> 1 daily_sales = butter_prices.dot(weekly_sales.T)
2 daily_sales
NameError: name 'weekly_sales' is not defined
```

**Input**

```
# Need to transpose again
weekly_sales["Total"] = daily_sales.T
weekly_sales
```

**Output**

```
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[96], line 2
1 # Need to transpose again
----> 2 weekly_sales["Total"] = daily_sales.T
3 weekly_sales
NameError: name 'daily_sales' is not defined
```

Finding out if one array is larger, smaller or equal to another.

**Input**

`a1`

**Output**

`array([1, 2, 3])`

**Input**

`a2`

**Output**

```
array([[1. , 2. , 3.3],
[4. , 5. , 6.5]])
```

**Input**

`a1 > a2`

**Output**

```
array([[False, False, False],
[False, False, False]])
```

**Input**

`a1 >= a2`

**Output**

```
array([[ True, True, False],
[False, False, False]])
```

**Input**

`a1 > 5`

**Output**

`array([False, False, False])`

**Input**

`a1 == a1`

**Output**

`array([ True, True, True])`

**Input**

`a1 == a2`

**Output**

```
array([[ True, True, False],
[False, False, False]])
```

`np.sort()`

- sort values in a specified dimension of an array`np.argsort()`

- return the indices to sort the array on a given axis`np.argmax()`

- return the index/indices which gives the highest value(s) along an axis`np.argmin()`

- return the index/indices which gives the lowest value(s) along an axis

**Input**

`random_array`

**Output**

```
array([[1, 7, 2],
[7, 0, 2],
[8, 8, 8],
[2, 5, 2],
[4, 8, 6]])
```

**Input**

`np.sort(random_array)`

**Output**

```
array([[1, 2, 7],
[0, 2, 7],
[8, 8, 8],
[2, 2, 5],
[4, 6, 8]])
```

**Input**

`np.argsort(random_array)`

**Output**

```
array([[0, 2, 1],
[1, 2, 0],
[0, 1, 2],
[0, 2, 1],
[0, 2, 1]])
```

**Input**

`a1`

**Output**

`array([1, 2, 3])`

**Input**

```
# Return the indices that would sort an array
np.argsort(a1)
```

**Output**

`array([0, 1, 2])`

**Input**

```
# No axis
np.argmin(a1)
```

**Output**

`0`

**Input**

`random_array`

**Output**

```
array([[1, 7, 2],
[7, 0, 2],
[8, 8, 8],
[2, 5, 2],
[4, 8, 6]])
```

**Input**

```
# Down the vertical
np.argmax(random_array, axis=1)
```

**Output**

`array([1, 0, 0, 1, 1])`

**Input**

```
# Across the horizontal
np.argmin(random_array, axis=0)
```

**Output**

`array([0, 1, 0])`

So let’s look at a possible use case, such as turning an image into a NumPy array.

Why focus on this?

Remember right at the beginning, we said how Machine Learning is all about **turning data into numbers and then figuring out the patterns?**

Well, we can use the NumPy array to find patterns in the image, and in turn, use those patterns to figure out what's actually in the image.

Pretty smart eh? This is what happens in modern computer vision algorithms, and it’s how it recognizes faces or photos of your cat in iPhone image recaps!

We’re going to use 3 examples here to show how this works.

So, let's start off with this beautiful image of a panda:

**Input**

```
from matplotlib.image import imread
panda = imread('../images/numpy-panda.jpeg')
print(type(panda))
```

**Output**

`<class 'numpy.ndarray'>`

**Input**

`panda.shape`

**Output**

`(2330, 3500, 3)`

**Input**

`panda`

**Output**

```
array([[[0.05490196, 0.10588235, 0.06666667],
[0.05490196, 0.10588235, 0.06666667],
[0.05490196, 0.10588235, 0.06666667],
...,
[0.16470589, 0.12941177, 0.09411765],
[0.16470589, 0.12941177, 0.09411765],
[0.16470589, 0.12941177, 0.09411765]],
[[0.05490196, 0.10588235, 0.06666667],
[0.05490196, 0.10588235, 0.06666667],
[0.05490196, 0.10588235, 0.06666667],
...,
[0.16470589, 0.12941177, 0.09411765],
[0.16470589, 0.12941177, 0.09411765],
[0.16470589, 0.12941177, 0.09411765]],
[[0.05490196, 0.10588235, 0.06666667],
[0.05490196, 0.10588235, 0.06666667],
[0.05490196, 0.10588235, 0.06666667],
...,
[0.16470589, 0.12941177, 0.09411765],
[0.16470589, 0.12941177, 0.09411765],
[0.16470589, 0.12941177, 0.09411765]],
...,
[[0.13333334, 0.07450981, 0.05490196],
[0.12156863, 0.0627451 , 0.04313726],
[0.10980392, 0.05098039, 0.03137255],
...,
[0.02745098, 0.02745098, 0.03529412],
[0.02745098, 0.02745098, 0.03529412],
[0.02745098, 0.02745098, 0.03529412]],
[[0.13333334, 0.07450981, 0.05490196],
[0.12156863, 0.0627451 , 0.04313726],
[0.12156863, 0.0627451 , 0.04313726],
...,
[0.02352941, 0.02352941, 0.03137255],
[0.02352941, 0.02352941, 0.03137255],
[0.02352941, 0.02352941, 0.03137255]],
[[0.13333334, 0.07450981, 0.05490196],
[0.12156863, 0.0627451 , 0.04313726],
[0.12156863, 0.0627451 , 0.04313726],
...,
[0.02352941, 0.02352941, 0.03137255],
[0.02352941, 0.02352941, 0.03137255],
[0.02352941, 0.02352941, 0.03137255]]], dtype=float32)
```

**Input**

```
car = imread("../images/numpy-car-photo.png")
car.shape
```

**Output**

`(431, 575, 4)`

**Input**

`car[:,:,:3].shape`

**Output**

`(431, 575, 3)`

**Input**

```
dog = imread("../images/numpy-dog-photo.png")
dog.shape
```

**Output**

`(432, 575, 4)`

**Input**

`dog`

**Output**

```
array([[[0.70980394, 0.80784315, 0.88235295, 1. ],
[0.72156864, 0.8117647 , 0.8862745 , 1. ],
[0.7411765 , 0.8156863 , 0.8862745 , 1. ],
...,
[0.49803922, 0.6862745 , 0.8392157 , 1. ],
[0.49411765, 0.68235296, 0.8392157 , 1. ],
[0.49411765, 0.68235296, 0.8352941 , 1. ]],
[[0.69411767, 0.8039216 , 0.8862745 , 1. ],
[0.7019608 , 0.8039216 , 0.88235295, 1. ],
[0.7058824 , 0.80784315, 0.88235295, 1. ],
...,
[0.5019608 , 0.6862745 , 0.84705883, 1. ],
[0.49411765, 0.68235296, 0.84313726, 1. ],
[0.49411765, 0.68235296, 0.8392157 , 1. ]],
[[0.6901961 , 0.8 , 0.88235295, 1. ],
[0.69803923, 0.8039216 , 0.88235295, 1. ],
[0.7058824 , 0.80784315, 0.88235295, 1. ],
...,
[0.5019608 , 0.6862745 , 0.84705883, 1. ],
[0.49803922, 0.6862745 , 0.84313726, 1. ],
[0.49803922, 0.6862745 , 0.84313726, 1. ]],
...,
[[0.9098039 , 0.81960785, 0.654902 , 1. ],
[0.8352941 , 0.7490196 , 0.6509804 , 1. ],
[0.72156864, 0.6313726 , 0.5372549 , 1. ],
...,
[0.01568628, 0.07058824, 0.02352941, 1. ],
[0.03921569, 0.09411765, 0.03529412, 1. ],
[0.03921569, 0.09019608, 0.05490196, 1. ]],
[[0.9137255 , 0.83137256, 0.6784314 , 1. ],
[0.8117647 , 0.7294118 , 0.627451 , 1. ],
[0.65882355, 0.5686275 , 0.47843137, 1. ],
...,
[0.00392157, 0.05490196, 0.03529412, 1. ],
[0.03137255, 0.09019608, 0.05490196, 1. ],
[0.04705882, 0.10588235, 0.06666667, 1. ]],
[[0.9137255 , 0.83137256, 0.68235296, 1. ],
[0.76862746, 0.68235296, 0.5882353 , 1. ],
[0.59607846, 0.5058824 , 0.44313726, 1. ],
...,
[0.03921569, 0.10196079, 0.07058824, 1. ],
[0.02745098, 0.08235294, 0.05882353, 1. ],
[0.05098039, 0.11372549, 0.07058824, 1. ]]], dtype=float32)
```

Don’t worry if this was all a lot to take in. Like I said up top, although this is an introduction to NumPy, how it works, and common commands, **don’t expect to understand it all right away**.

If you get stuck or think of something you'd like to do that this article doesn't cover, don't fear!

The recommended steps you take are:

**Try it**- Since NumPy is very friendly, your first step should be to use what you know and try to figure out the answer to your own question (getting it wrong is part of the process). If in doubt, run your code**Search for it**- If trying it on your own doesn't work, since someone else has probably tried to do something similar, try searching for your problem in the following places (either via a search engine or directly):- NumPy documentation - The ground truth for everything NumPy, this resource covers all of the NumPy functionality
- The Zero To Mastery Machine Learning Discord channel - if you’re a member of ZTM, jump into the Discord and ask me, other students, and current ML engineers any NumPy questions you have
- ChatGPT - ChatGPT is very good at explaining code, however, it can make mistakes. Make sure that you verify the code it writes first before using it. A great hack for this is to ask ChatGPT to explain the code. Try asking "Can you explain the following code for me? {your code here}" and then continue with follow-up questions from there.

Remember:You don't have to learn all of the functions by heart to begin with. What's most important is continually asking yourself, "What am I trying to do with the data?".Start by answering that question and then practicing finding the code that does it.

If you want to deep dive into this and learn Machine Learning from scratch, then check out my complete Machine Learning and Data Science course, or watch the first few videos for free.

It’s one of the most popular, highly rated machine learning and data science bootcamps online, as well as the most modern and up-to-date.

I guarantee it 😎.

You'll go from a complete beginner with no prior experience to getting hired as a Machine Learning Engineer, so it’s helpful for ML Engineers of all experience levels. Even total beginners or current ML professionals.