This notebook was created by Jean de Dieu Nyandwi for the love of machine learning community. For any feedback, errors or suggestion, he can be reached on email (johnjw7084 at gmail dot com), Twitter, or LinkedIn.
Intro to NumPy for Data Computations¶
This is lab is performing data computations with NumPy. NumPy is a scientific tool used to make mathematical computations easily.
In this lab, you will learn to:
If you are using Google Colab, we do not need to install NumPy. We will only have to import it just like this:
import numpy as np
If you are using local Jupyter notebooks, make sure you have it installed already.
1. Creating an Array in NumPy¶
Array can either be vector or matrice. A vector is one dimensional array, and a matrix is a two or more dimensional array.
## Importing numpy
import numpy as np
## Creating a simple 1 dimensional array: vector
np.array([1,2,3,4,5])
array([1, 2, 3, 4, 5])
## Creating 2 dimensional array: matrix
np.array([(1,2,3,4,5), (6,7,8,9,10)])
array([[ 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10]])
## Creating an array from a list
num_list = [1,2,3,4,5]
np.array(num_list)
array([1, 2, 3, 4, 5])
print(np.array(num_list))
[1 2 3 4 5]
1.1 Generating Array¶
NumPy offers various options to generate an array depending on particular need, such as:
- Generating identity array
- Generating zero array of a given size
- Generating ones array with a given size
- Generating an array in a given range
- Generating an array with random values
## Generating an identity array
identity_array = np.identity(4)
print(identity_array)
[[1. 0. 0. 0.] [0. 1. 0. 0.] [0. 0. 1. 0.] [0. 0. 0. 1.]]
## Generating an identity matrix of 1s
np.eye(4)
array([[1., 0., 0., 0.], [0., 1., 0., 0.], [0., 0., 1., 0.], [0., 0., 0., 1.]])
# You can multiply with any constant
np.eye(4) * 7
array([[7., 0., 0., 0.], [0., 7., 0., 0.], [0., 0., 7., 0.], [0., 0., 0., 7.]])
# Generating zero array of a given size
# 1 dimensional zero array
np.zeros(5)
array([0., 0., 0., 0., 0.])
# Creating two dimensional array: pass the tuple of rows and columns' number
#np.zeros((rows, columns))
np.zeros((5,6))
array([[0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0.], [0., 0., 0., 0., 0., 0.]])
# Generating ones array of a given size
# 1 dimensional one array
np.ones(5)
array([1., 1., 1., 1., 1.])
# Creating two dimensional ones array: pass the tuple of rows and columns' number
# np.ones((rows, columns))
np.ones((5,6))
array([[1., 1., 1., 1., 1., 1.], [1., 1., 1., 1., 1., 1.], [1., 1., 1., 1., 1., 1.], [1., 1., 1., 1., 1., 1.], [1., 1., 1., 1., 1., 1.]])
## Generating an array in a given range or interval
np.arange(0,5)
array([0, 1, 2, 3, 4])
## If you want to control the step size
np.arange(0,20,2)
array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18])
## You can also use linspace to generate an evenly spaced numbers in a given interval
np.linspace(0,20,5)
array([ 0., 5., 10., 15., 20.])
np.linspace(0,100,5)
array([ 0., 25., 50., 75., 100.])
np.linspace(0,10,10)
array([ 0. , 1.11111111, 2.22222222, 3.33333333, 4.44444444, 5.55555556, 6.66666667, 7.77777778, 8.88888889, 10. ])
## Generating an array with random values
# Create a 1D array with 4 random numbers
np.random.rand(4)
array([0.68944519, 0.25872307, 0.7565542 , 0.68606423])
np.random.rand(4)
#We will not get teh same values
array([0.41979127, 0.83292096, 0.50330078, 0.17331376])
np.random.rand(4,5)
array([[0.88627071, 0.55624758, 0.97198928, 0.74128787, 0.02940347], [0.05604389, 0.22823893, 0.52886436, 0.91998249, 0.01327729], [0.74984196, 0.00163448, 0.08632411, 0.08515202, 0.70213274], [0.67293052, 0.18162822, 0.38745748, 0.42938446, 0.56581595]])
### Generate one random integer in a given range
np.random.randint(5,50)
27
### Generate 10 random integers in a given range
np.random.randint(5,50,10)
array([ 6, 44, 15, 7, 34, 32, 38, 20, 16, 27])
## Random see to output the same random vaues at all run time
import random
random.seed(10)
random.randint(5,50)
41
2. Data Selection: Indexing and slicing an Array¶
Indexing: Selecting individual elements from the array
Slicing: Selecting group of element from the array.
2.1 1D Array Indexing and Selection¶
# Creating a 1 dimensional vector
array_1d = np.array([1,2,3,4,5])
## Indexing: selcting an element from an array
array_1d[1]
2
array_1d [-1]
5
# Slicing: Returning the grou of element from an array
array_1d [2:4]
array([3, 4])
2.2 2D Array Indexing and Selection¶
## Indexing 2D array
array_2d = np.array([[1,2,3],[4,5,6],[7,8,9]])
## Selecting individual element
## array_2d[row][column]
## let's select 5..that is row 1, column 1 (we start from 0!!)
array_2d[1][1]
5
# let's select 9..that is row 2, column 2
array_2d[2][2]
9
## Selecting whole row
#array_2d[row]
array_2d[1]
array([4, 5, 6])
## Selecting group of elements in 2D array
## array_2d[rows, columns]..You select rows and columns
## Let's select the first two rows
## Rows :2 denotes that we are selecting all rows up to the second.
## Columns : denotes that all columns are selected.
array_2d[:2,:]
array([[1, 2, 3], [4, 5, 6]])
## Selecting all first two rows and first two columns
array_2d[:2,0:2]
array([[1, 2], [4, 5]])
## Above is same as
array_2d[0:2,:]
array([[1, 2, 3], [4, 5, 6]])
## This will return all rows, and so all columns and so same as orginal array
array_2d[0:3,:]
array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
## return the second row
array_2d[2,:]
array([7, 8, 9])
## return the second column
array_2d[:,2]
array([3, 6, 9])
## return the last two columns
array_2d[:,1:3]
array([[2, 3], [5, 6], [8, 9]])
## return the first column
array_2d[:,0]
array([1, 4, 7])
## return the first row
array_2d[0,:]
array([1, 2, 3])
Indexing or selecting 2D array may seems confusing but when you try it multiple times, you get the idea. If you are selecting an entire row, that means the all the columns are selected (but not their all values). And vice versa.
As shown below, we are selecting the first row, but as you can see all columns are selected (:).
array_2d[0,:]
2.3 Conditional selection¶
You can use a condition to select values in an array. Let's use comparison operators to select the values.
## Let's create an array
arr= np.array(([1,2,3],[4,5,6],[7,8,9]))
## Select all elements in an array which are less than 6
arr[arr <6 ]
array([1, 2, 3, 4, 5])
## Select all elements in an array which are greater than 6
arr[arr > 6]
array([7, 8, 9])
## Select all even numbers in an array
arr[arr % 2 ==0 ]
array([2, 4, 6, 8])
## Select all odd numbers in an array
arr[arr % 2 !=0 ]
array([1, 3, 5, 7, 9])
## You can also have multiple conditions
## In all odd numbers, return values which are greater or equal to 5
arr[(arr % 2 !=0 ) & (arr >=5) ]
array([5, 7, 9])
## Using logical selection, you can also return True for values in which a given condition is met in an array
arr > 5
array([[False, False, False], [False, False, True], [ True, True, True]])
## We do not have 0 in our array
arr == 0
array([[False, False, False], [False, False, False], [False, False, False]])
# Let's create two arrays
arr1 = np.arange(0,5)
arr2 = np.arange(6,11)
## Addition
arr1 + arr2
array([ 6, 8, 10, 12, 14])
## Subtraction
arr2 - arr1
array([6, 6, 6, 6, 6])
## Multiplication
arr1 * arr2
array([ 0, 7, 16, 27, 40])
## Division
arr1 / arr2
array([0. , 0.14285714, 0.25 , 0.33333333, 0.4 ])
## Squaring
arr1 ** 2
array([ 0, 1, 4, 9, 16])
3.2 Universal functions¶
NumPy universal functions (ufunc
) allows to compute math, trigonometric, logical and comparison operations such as sin, cos, tan, exponent(exp), log, square, greater, less, etc...
## creating two arrays
arr1 = np.arange(0,5)
arr2 = np.arange(6,11)
## Calculating the sum of two arrays
np.add(arr1, arr2)
array([ 6, 8, 10, 12, 14])
## Calculating the product of two arrays
np.multiply(arr1, arr2)
array([ 0, 7, 16, 27, 40])
## Calculating the difference between two arrays
np.subtract(arr1, arr2)
array([-6, -6, -6, -6, -6])
## Calculating the division of two arrays
np.divide(arr1, arr2)
array([0. , 0.14285714, 0.25 , 0.33333333, 0.4 ])
## Calculating the sin of arr1
np.sin(arr1)
array([ 0. , 0.84147098, 0.90929743, 0.14112001, -0.7568025 ])
np.sin([0,45,90,180])
array([ 0. , 0.85090352, 0.89399666, -0.80115264])
## Calculating the cosine of arr 1
np.cos(arr1)
array([ 1. , 0.54030231, -0.41614684, -0.9899925 , -0.65364362])
np.cos([0,45,90,180])
array([ 1. , 0.52532199, -0.44807362, -0.59846007])
## Calculating the tangent(tan) of the array
np.tan(arr2)
array([-0.29100619, 0.87144798, -6.79971146, -0.45231566, 0.64836083])
## Calculating the logarithmic(log) of the array
np.log(arr2)
array([1.79175947, 1.94591015, 2.07944154, 2.19722458, 2.30258509])
## Calculating the exponent(exp or e^) of the array
np.exp(arr2)
array([ 403.42879349, 1096.63315843, 2980.95798704, 8103.08392758, 22026.46579481])
## Calculating the power of the array
## Array 1 is powered array 2...0^6=0, 1^7=1, 2^8=256, etc..
np.power(arr1, arr2)
array([ 0, 1, 256, 19683, 1048576])
## Comparison operations return true or false
## Arr 1 is less than arr 2...so that's false
np.greater(arr1, arr2)
array([False, False, False, False, False])
## Comparison operations return true or false
## Arr 1 is less than arr 2...so that's true
np.less(arr1, arr2)
array([ True, True, True, True, True])
4. Basic Statistics¶
With NumPy, we can compute the basic statistics such as the standard deviation (std), variance (var),mean, median, minimum value, maximum value of an array.
More about NumPy statistics: https://numpy.org/doc/stable/reference/routines.statistics.html#order-statistics
## Creating an array
arr = np.arange(0,5)
arr
array([0, 1, 2, 3, 4])
4.1 Standard Deviation¶
## calculating the standard deviation of the array
## Std is how much an element of the array deviates from the mean of the array
np.std(arr)
1.4142135623730951
arr2 = np.array([[3,4], [5,6]])
np.std(arr2)
1.118033988749895
## Specifying the axis
## By default, the std is computed on the flattened values (or converted into a single column vector)
np.std(arr2, axis=0)
array([1., 1.])
np.std(arr2, axis=1)
array([0.5, 0.5])
4.2 Variance¶
## Calculating the Variance (var)
arr = np.arange(0,5)
np.var(arr)
2.0
np.var(arr2)
1.25
4.3 Mean¶
## Calculating the mean of the array
np.mean(arr)
2.0
## mean gives the same results as the average
np.average(arr)
2.0
4.4 Median¶
## Calculating the median of the array
np.median(arr)
2.0
4.3 Minimum and Maximum¶
## Calculating the minimum value
np.min(arr)
0
## Calculating the maximum value
np.max(arr)
4
5. Data Manipulation¶
Data Manipulation is important step in Machine Learning project. Let's some of NumPy methods and functions which are useful in data manipulation.
5.1 Shape of the array¶
## Creating an array
arr1 = np.arange(0,10)
arr2 = np.array(([1,2,3],[4,5,6],[7,8,9]))
arr1
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
arr2
array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
np.shape(arr1)
(10,)
np.shape(arr2)
(3, 3)
arr2.shape
(3, 3)
5.2 Shaping the Array¶
np.reshape(array_name, newshape=(rows, columns)
or array_name.reshape(rows, columns)
change the shape of the array. The rows and columns of the new shape has to comform with the existing data of the array. Otherwise, it won't work. Take an example, you can convert (3,3) array into (1,9) but you can't convert it into (5,5).
### arr1 is (10,)....10 rows, 1 column. Let's reshape it into (5,2)
np.reshape(arr1, newshape=(5,2))
array([[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]])
## This would also work
arr1.reshape(5,2)
array([[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]])
arr2_reshaped = arr2.reshape(9,1)
arr2_reshaped.T
array([[1, 2, 3, 4, 5, 6, 7, 8, 9]])
arr2_reshaped.reshape(3,3)
array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
## np.resize can also be used to change the shape of the array into a specific size
np.resize(arr2, (1,9))
array([[1, 2, 3, 4, 5, 6, 7, 8, 9]])
5.3 Copying array¶
arr1 = np.arange(0,10)
arr1
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
arr1_copy = arr1.copy()
arr1_copy
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
## Copying the values of one array into the other
## Let's copy array 2 into 1 --they have the same shape
arr1 = np.arange(0,6)
arr2 = np.arange(6,12)
## arr1 is destination, arr2 is source
np.copyto(arr1, arr2)
arr1
array([ 6, 7, 8, 9, 10, 11])
5.4 Joining arrays¶
### Creating two arrays
arr1 = np.array([[1,2,3],[4,5,6],[7,8,9]])
arr2 = np.array([[10,11,12]])
## Joining them
np.concatenate((arr1, arr2))
array([[ 1, 2, 3], [ 4, 5, 6], [ 7, 8, 9], [10, 11, 12]])
## Transposing arr2
## arr2.T is transpose operation
np.concatenate((arr1, arr2.T), axis=1)
array([[ 1, 2, 3, 10], [ 4, 5, 6, 11], [ 7, 8, 9, 12]])
### Setting axis to none flatten the array
np.concatenate((arr1, arr2), axis=None)
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
### Joining two 1Ds array into 2D array: Stacking
# Column stacking
arr1 = np.arange(0,6)
arr2 = np.arange(6,12)
np.column_stack((arr1, arr2))
array([[ 0, 6], [ 1, 7], [ 2, 8], [ 3, 9], [ 4, 10], [ 5, 11]])
## Row stacking
np.row_stack((arr1, arr2))
array([[ 0, 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10, 11]])
5.5 Splitting arrays¶
arr1 = np.arange(0,6)
arr1
array([0, 1, 2, 3, 4, 5])
### Splitting the array into two arrays
np.split(arr1, 2)
[array([0, 1, 2]), array([3, 4, 5])]
### Splitting the array into three arrays
np.split(arr1, 3)
[array([0, 1]), array([2, 3]), array([4, 5])]
5.6 Adding and repeating elements in an array¶
arr1 = np.arange(0,6)
arr1
array([0, 1, 2, 3, 4, 5])
## Adding the values at the end of the array
np.append(arr1,7)
array([0, 1, 2, 3, 4, 5, 7])
### Given an array, can you add itself multiple times? or repeat it?
arr = np.array([[1,2,3]])
np.tile(arr, 3)
array([[1, 2, 3, 1, 2, 3, 1, 2, 3]])
np.repeat(arr,3)
array([1, 1, 1, 2, 2, 2, 3, 3, 3])
5.7 Sorting elements in an array¶
arr = np.array([[1,2,3,4,5,3,2,1,3,5,6,7,7,5,9,5]])
np.sort(arr)
array([[1, 1, 2, 2, 3, 3, 3, 4, 5, 5, 5, 5, 6, 7, 7, 9]])
## Finding the unique elements in an array
arr = np.array([[1,2,3,4,5,3,2,1,3,5,6,7,7,5,9,5]])
np.unique(arr)
array([1, 2, 3, 4, 5, 6, 7, 9])
5.8 Reversing an array¶
## You can also flip the array
arr = np.array([[1,2,3],[4,5,6],[7,8,9]])
arr
array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
## Up/down flipping
np.flipud(arr)
array([[7, 8, 9], [4, 5, 6], [1, 2, 3]])
## left/right flipping
np.fliplr(arr)
array([[3, 2, 1], [6, 5, 4], [9, 8, 7]])
That's it for NumPy. In this lab, you learned how to create an array, perform basic operations, and also how to manipulate an array.
In the next lab, we will learn about the Pandas, another important tool used for real world data manipulation.