NumPy (acronym for ‘Numerical Python’ or ‘Numeric Python’) is one of the most essential package for speedy mathematical computation on arrays and matrices in Python. It is also quite useful while dealing with multi-dimensional data. It is a blessing for integrating C, C++ and FORTRAN tools. It also provides numerous functions for Fourier transform (FT) and linear algebra.

## Why NumPy instead of lists?

One might think of why one should prefer arrays in NumPy instead we can create lists having the same data type. If this statement also rings a bell then the following reasons may convince you:

- Numpy arrays have contiguous memory allocation. Thus if a same array stored as list will require more space as compared to arrays.
- They are more speedy to work with and hence are more efficient than the lists.
- They are more convenient to deal with.

## NumPy vs. Pandas

Pandas is built on top of NumPy. In other words,Numpy is required by pandas to make it work. So Pandas is not an alternative to Numpy. Instead pandas offers additionalmethod or provides more streamlined way of working with numerical and tabular data in Python.

**Importing numpy**

Firstly you need to import the numpy library. Importing numpy can be done by running the following command:

import numpy as np

It is a general approach to import numpy with alias as ‘np’. If alias is not provided then to access the functions from numpy we shall write **numpy.function**. To make it easier an alias ‘np’ is introduced so we can write **np.function**. Some of the common functions of numpy are listed below –

Functions | Tasks |
---|---|

array | Create numpy array |

ndim | Dimension of the array |

shape | Size of the array (Number of rows and Columns) |

size | Total number of elements in the array |

dtype | Type of elements in the array, i.e., int64, character |

reshape | Reshapes the array without changing the original shape |

resize | Reshapes the array. Also change the original shape |

arange | Create sequence of numbers in array |

Itemsize | Size in bytes of each item |

diag | Create a diagonal matrix |

vstack | Stacking vertically |

hstack | Stacking horizontally |

**1D array**

Using numpy an array is created by using **np.array**:

a = np.array([15,25,14,78,96])

```
a
Output: array([15, 25, 14, 78, 96])
print(a)
Output: [15 25 14 78 96]
```

Notice that in np.array **square brackets are present**. Absence of square bracket introduces an error. To print the array we can use **print(a)**.

**Changing the datatype**

np.array( ) has an additional parameter of **dtype** through which one can define whether the elements are integers or floating points or complex numbers.

a.dtype

Initially datatype of ‘a’ was **‘int32’** which on modifying becomes **‘float64’**.

**int32**refers to number without a decimal point. ’32’ means number can be in between-2147483648 and 2147483647. Similarly, int16 implies number can be in range -32768 to 32767**float64**refers to number with decimal place.

**Creating the sequence of numbers**

If you want to create a sequence of numbers then using **np.arange,** we can get our sequence. To get the sequence of numbers from 20 to 29 we run the following command.

b = np.arange(start = 20,stop = 30, step = 1)

```
array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29])
```

**In np.arange the end point is always excluded.**

**np.arange** provides an option of step which defines the difference between 2 consecutive numbers. If step is not provided then it takes the value 1 by default.

Suppose we want to create an **arithmetic progression** with initial term 20 and common difference 2, upto 30; 30 being excluded.

c = np.arange(20,30,2) #30 is excluded.

```
array([20, 22, 24, 26, 28])
```

It is to be taken care that in np.arange( ) the stop argument is always excluded.

**Indexing in arrays**

It is important to note that **Python indexing starts from 0**. The syntax of indexing is as follows –

**x[start:end:step]**: Elements in array x start through the end (but the end is excluded), default step value is 1.**x[start:end]**: Elements in array x start through the end (but the end is excluded)**x[start:]**: Elements start through the end**x[:end]**: Elements from the beginning through the end (but the end is excluded)

If we want to extract 3rd element we write the index as 2 as it starts from 0.

x = np.arange(10)

```
x
Output: [0 1 2 3 4 5 6 7 8 9]
x[2]
Output: 2
x[2:5]
Output: array([2, 3, 4])
x[::2]
Output: array([0, 2, 4, 6, 8])
x[1::2]
Output: array([1, 3, 5, 7, 9])
```

Note that in **x[2:5] elements starting from 2nd index up to 5th index(exclusive) are selected**.

If we want to change the value of all the elements from starting upto index 7,excluding 7, with a step of 3 as 123 we write:

x[:7:3] = 123

```
array([123, 1, 2, 123, 4, 5, 123, 7, 8, 9])
```

To reverse a given array we write:

x = np.arange(10)

`array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])`

Note that the above command does not modify the original array.

**Reshaping the arrays**

To reshape the array we can use **reshape( ).**

f = np.arange(101,113)

`array([101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112])`

Note that **reshape()** does not alter the shape of the original array. Thus to modify the original array we can use **resize( )**

f.resize(3,4)

```
array([[101, 102, 103, 104],
[105, 106, 107, 108],
[109, 110, 111, 112]])
```

If a dimension is given as -1 in a reshaping, the other dimensions are automatically calculated provided that the given dimension is a multiple of total number of elements in the array.

f.reshape(3,-1)

```
array([[101, 102, 103, 104],
[105, 106, 107, 108],
[109, 110, 111, 112]])
```

In the above code we only directed that we will have 3 rows. Python automatically calculates the number of elements in other dimension i.e. 4 columns.

**Missing Data**

The missing data is represented by NaN (acronym for Not a Number). You can use the command **np.nan**

val = np.array([15,10,

np.nan

```
val.sum()
Out: nan
```

To ignore missing values, you can use **np.nansum(val)** which returns 45

To check whether array contains missing value, you can use the function**isnan( )**

np.isnan(val)

**2D arrays**

A 2D array in numpy can be created in the following manner:

g = np.array([(10,20,30),(40,50,60)])

The dimension, total number of elements and shape can be ascertained by **ndim, size** and **shape** respectively:

g.ndim

```
g.ndim
Output: 2
g.size
Output: 6
g.shape
Output: (2, 3)
```

**Creating some usual matrices**

numpy provides the utility to create some usual matrices which are commonly used for linear algebra.

To create a matrix of all zeros of 2 rows and 4 columns we can use **np.zeros( )**:

np.zeros( (2,4) )

```
array([[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.]])
```

Here the dtype can also be specified. For a zero matrix the default dtype is ‘float’. To change it to integer we write **‘dtype = np.int16’**

np.zeros([2,4],dtype=np.int16)

```
array([[0, 0, 0, 0],
[0, 0, 0, 0]], dtype=int16)
```

To get a matrix of all random numbers from 0 to 1 we write **np.empty.**

np.empty( (2,3) )

```
array([[ 2.16443571e-312, 2.20687562e-312, 2.24931554e-312],
[ 2.29175545e-312, 2.33419537e-312, 2.37663529e-312]])
```

Note: The results may vary everytime you run **np.empty.**

To create a matrix of unity we write **np.ones( )**. We can create a 3 * 3 matrix of all ones by:

np.ones([3,3])

```
array([[ 1., 1., 1.],
[ 1., 1., 1.],
[ 1., 1., 1.]])
```

To create a diagonal matrix we can write **np.diag( )**. To create a diagonal matrix where the diagonal elements are 14,15,16 and 17 we write:

np.diag([14,15,16,17])

```
array([[14, 0, 0, 0],
[ 0, 15, 0, 0],
[ 0, 0, 16, 0],
[ 0, 0, 0, 17]])
```

To create an identity matrix we can use **np.eye( )** .

np.eye(5,dtype = “int”)

```
array([[1, 0, 0, 0, 0],
[0, 1, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 0],
[0, 0, 0, 0, 1]])
```

By default the datatype in np.eye( ) is ‘float’ thus we write dtype = “int” to convert it to integers.

**Reshaping 2D arrays**

To get a flattened 1D array we can use **ravel( )**

g = np.array([(10,20,30),(40,50,60)])

`array([10, 20, 30, 40, 50, 60])`

To change the shape of 2D array we can use **reshape**. Writing -1 will calculate the other dimension automatically and does not modify the original array.

g.reshape(3,-1) # returns the array with a modified shape

` (2, 3)`

Similar to 1D arrays, using **resize( )** will modify the shape in the original array.

g.resize((3,2))

```
array([[10, 20],
[30, 40],
[50, 60]])
```

**Time for some matrix algebra**

Let us create some arrays A,b and B and they will be used for this section:

In order to get the transpose, trace and inverse we use **A.transpose( ) , np.trace( )** and **np.linalg.inv( )** respectively.

A.T #transpose

```
A.transpose() #transpose
Output:
array([[2, 4, 7],
[0, 3, 6],
[1, 8, 9]])
np.trace(A) # trace
Output: 14
np.linalg.inv(A) #Inverse
Output:
array([[ 0.53846154, -0.15384615, 0.07692308],
[-0.51282051, -0.28205128, 0.30769231],
[-0.07692308, 0.30769231, -0.15384615]])
```

Note that **transpose does not modify the original array.**

**Matrix addition and subtraction** can be done in the usual way:

A+B

```
A+B
Output:
array([[12, 20, 31],
[44, 53, 68],
[77, 86, 99]])
A-B
Output:
array([[ -8, -20, -29],
[-36, -47, -52],
[-63, -74, -81]])
```

**Matrix multiplication** of A and B can be accomplished by **A.dot(B)**. Where A will be the 1st matrix on the left hand side and B will be the second matrix on the right side.

A.dot(B)

```
array([[ 90, 120, 150],
[ 720, 870, 1020],
[ 940, 1160, 1380]])
```

To solve the system of linear equations: Ax = b we use **np.linalg.solve( )**

np.linalg.solve(A,b)

`array([-13.92307692, -24.69230769, 28.84615385])`

The eigen values and eigen vectors can be calculated using **np.linalg.eig( )**

np.linalg.eig(A)

```
(array([ 14.0874236 , 1.62072127, -1.70814487]),
array([[-0.06599631, -0.78226966, -0.14996331],
[-0.59939873, 0.54774477, -0.81748379],
[-0.7977253 , 0.29669824, 0.55608566]]))
```

The first row are the various eigen values and the second matrix denotes the matrix of eigen vectors where each column is the eigen vector to the corresponding eigen value.

**Some Mathematics functions**

We can have various trigonometric functions like sin, cosine etc. using numpy:

B = np.array([[0,-20,36],[40,50,1]])

```
array([[ 0. , -0.91294525, -0.99177885],
[ 0.74511316, -0.26237485, 0.84147098]])
```

The resultant is the matrix of all sin( ) elements.

In order to get the exponents we use ******

B**2

```
array([[ 0, 400, 1296],
[1600, 2500, 1]], dtype=int32)
```

We get the matrix of the square of all elements of B.

In order to obtain if a condition is satisfied by the elements of a matrix we need to write the criteria. For instance, to check if the elements of B are more than 25 we write:

B>25

```
array([[False, False, True],
[ True, True, False]], dtype=bool)
```

We get a matrix of Booleans where True indicates that the corresponding element is greater than 25 and False indicates that the condition is not satisfied.

In a similar manner **np.absolute, np.sqrt** and **np.exp** return the matrices of absolute numbers, square roots and exponentials respectively.

np.absolute(B)

Now we consider a matrix A of shape 3*3:

A = np.arange(1,10).reshape(3,3)

```
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
```

To find the sum, minimum, maximum, mean, standard deviation and variance respectively we use the following commands:

A.sum()

```
A.sum()
Output: 45
A.min()
Output: 1
A.max()
Output: 9
A.mean()
Output: 5.0
A.std() #Standard deviation
Output: 2.5819888974716112
A.var()
Output: 6.666666666666667
```

In order to obtain the index of the minimum and maximum elements we use **argmin( )** and **argmax( )** respectively.

A.argmin()

```
A.argmin()
Output: 0
A.argmax()
Output: 8
```

If we wish to find the above statistics for each row or column then we need to specify the axis:

A.sum(axis=0)

```
A.sum(axis=0) # sum of each column, it will move in downward direction
Output: array([12, 15, 18])
A.mean(axis = 0)
Output: array([ 4., 5., 6.])
A.std(axis = 0)
Output: array([ 2.44948974, 2.44948974, 2.44948974])
A.argmin(axis = 0)
Output: array([0, 0, 0], dtype=int64)
```

By defining **axis = 0,** calculations will move in downward direction i.e. it will give the statistics for each column. To find the min and index of maximum element for each row, we need to move in right-wise direction so we write **axis = 1**:

A.min(axis=1)

```
A.min(axis=1) # min of each row, it will move in rightwise direction
Output: array([1, 4, 7])
A.argmax(axis = 1)
Output: array([2, 2, 2], dtype=int64)
```

To find the cumulative sum along each row we use **cumsum( )**

A.cumsum(axis=1)

```
array([[ 1, 3, 6],
[ 4, 9, 15],
[ 7, 15, 24]], dtype=int32)
```

**Creating 3D arrays**

Numpy also provides the facility to create 3D arrays. A 3D array can be created as:

X = np.array( [[[ 1, 2,3],

X contains two 2D arrays Thus the shape is 2,2,3. Totol number of elements is 12.

To calculate the sum along a particular axis we use the axis parameter as follows:

X.sum(axis = 0)

```
X.sum(axis = 0)
Output:
array([[ 8, 10, 12],
[14, 16, 18]])
X.sum(axis = 1)
Output:
array([[ 5, 7, 9],
[17, 19, 21]])
X.sum(axis = 2)
Output:
array([[ 6, 15],
[24, 33]])
```

axis = 0 returns the sum of the corresponding elements of each 2D array. axis = 1 returns the sum of elements in each column in each matrix while axis = 2 returns the sum of each row in each matrix.

X.ravel()

`array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])`

ravel( ) writes all the elements in a single array.

Consider a 3D array:

X = np.array( [[[ 1, 2,3],

To extract the 2nd matrix we write:

X[1,…] # same as X[1,:,:] or X[1]

```
array([[ 7, 8, 9],
[10, 11, 12]])
```

Remember python indexing starts from 0 that is why we wrote 1 to extract the 2nd 2D array.

To extract the first element from all the rows we write:

X[…,0] # same as X[:,:,0]

```
array([[ 1, 4],
[ 7, 10]])
```

**Find out position of elements that satisfy a given condition**

a = np.array([8, 3, 7, 0, 4, 2, 5, 2])

`array([0, 2, 6]`

**np.where** locates the positions in the array where element of array is greater than 4.

**Indexing with Arrays of Indices**

Consider a 1D array.

x = np.arange(11,35,2)

`array([11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33])`

We form a 1D array i which subsets the elements of x as follows:

i = np.array( [0,1,5,3,7,9 ] )

`array([11, 13, 21, 17, 25, 29])`

In a similar manner we create a 2D array j of indices to subset x.

j = np.array( [ [ 0, 1], [ 6, 2 ] ] )

```
array([[11, 13],
[23, 15]])
```

Similarly we can create both i and j as 2D arrays of indices for x

x = np.arange(15).reshape(3,5)

To get the ith index in row and jth index for columns we write:

x[i,j] # i and j must have equal shape

```
array([[ 1, 6],
[12, 0]])
```

To extract ith index from 3rd column we write:

x[i,2]

```
array([[ 2, 7],
[12, 2]])
```

For each row if we want to find the jth index we write:

x[:,j]

```
array([[[ 1, 1],
[ 2, 0]],
[[ 6, 6],
[ 7, 5]],
[[11, 11],
[12, 10]]])
```

Fixing 1st row and jth index,fixing 2nd row jth index, fixing 3rd row and jth index.

You can also use indexing with arrays to assign the values:

x = np.arange(10)

`array([0, 0, 0, 3, 0, 0, 6, 7, 0, 9])`

0 is assigned to 4th, 5th, 8th, 1st and 2nd indices of x.

When the list of indices contains repetitions then it assigns the last value to that index:

x = np.arange(10)

`array([ 0, 1, 300, 400, 200, 5, 6, 7, 8, 9])`

Notice that for the 5th element(i.e. 4th index) the value assigned is 200, not 100.

Caution: If one is using += operator on repeated indices then it carries out the operator only once on repeated indices.

x = np.arange(10)

`array([0, 2, 2, 3, 4, 5, 6, 8, 8, 9])`

Although index 1 and 7 are repeated but they are incremented only once.

**Indexing with Boolean Arrays**

We create a 2D array and store our condition in b. If we the condition is true it results in True otherwise False.

a = np.arange(12).reshape(3,4)

```
array([[False, False, False, False],
[False, True, True, True],
[ True, True, True, True]], dtype=bool)
```

Note that ‘b’ is a Boolean with same shape as that of ‘a’.

To select the elements from ‘a’ which adhere to condition ‘b’ we write:

a[b]

`array([ 5, 6, 7, 8, 9, 10, 11])`

Now ‘a’ becomes a 1D array with the selected elements

This property can be very useful in assignments:

a[b] = 0

```
array([[0, 1, 2, 3],
[4, 0, 0, 0],
[0, 0, 0, 0]])
```

All elements of ‘a’ higher than 4 become 0

As done in integer indexing we can use indexing via Booleans:

Let x be the original matrix and ‘y’ and ‘z’ be the arrays of Booleans to select the rows and columns.

x = np.arange(15).reshape(3,5)

We write the x[y,:] which will select only those rows where y is True.

x[y,:] # selecting rows

Writing x[:,z] will select only those columns where z is True.

x[:,z] # selecting columns

```
x[y,:] # selecting rows
Output:
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
x[y] # same thing
Output:
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
x[:,z] # selecting columns
Output:
array([[ 0, 1, 3],
[ 5, 6, 8],
[10, 11, 13]])
```

**Statistics on Pandas DataFrame**

Let’s create dummy data frame for illustration :

```
np.random.seed(234)
mydata = pd.DataFrame({"x1" : np.random.randint(low=1, high=100, size=10),
"x2" : range(10)
})
```

**1. Calculate mean of each column of data frame**

np.mean(mydata)

**2. Calculate median of each column of data frame**

np.median(mydata, axis=0)

**axis = 0** means the median function would be run on each column. axis = 1 implies the function to be run on each row.

**Stacking various arrays**

Let us consider 2 arrays A and B:

A = np.array([[10,20,30],[40,50,60]])

To join them vertically we use **np.vstack( ).**

np.vstack((A,B)) #Stacking vertically

```
array([[ 10, 20, 30],
[ 40, 50, 60],
[100, 200, 300],
[400, 500, 600]])
```

To join them horizontally we use **np.hstack( ).**

np.hstack((A,B)) #Stacking horizontally

```
array([[ 10, 20, 30, 100, 200, 300],
[ 40, 50, 60, 400, 500, 600]])
```

**newaxis** helps in transforming a 1D row vector to a 1D column vector.

from numpy import newaxis

```
array([[ 4.],
[ 1.]])
```

#The function **np.column_stack( )** stacks 1D arrays as columns into a 2D array. It is equivalent to hstack only for 1D arrays:

np.column_stack((a[:,newaxis],b[:,newaxis]))

```
np.column_stack((a[:,newaxis],b[:,newaxis]))
Output:
array([[ 4., 2.],
[ 1., 8.]])
np.hstack((a[:,newaxis],b[:,newaxis]))
Output:
array([[ 4., 2.],
[ 1., 8.]])
```

**Splitting the arrays**

Consider an array ‘z’ of 15 elements:

z = np.arange(1,16)

Using **np.hsplit( )** one can split the arrays

np.hsplit(z,5) # Split a into 5 arrays

```
[array([1, 2, 3]),
array([4, 5, 6]),
array([7, 8, 9]),
array([10, 11, 12]),
array([13, 14, 15])]
```

It splits ‘z’ into 5 arrays of eqaual length.

On passing 2 elements we get:

np.hsplit(z,(3,5))

```
[array([1, 2, 3]),
array([4, 5]),
array([ 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])]
```

It splits ‘z’ after the third and the fifth element.

For 2D arrays np.hsplit( ) works as follows:

A = np.arange(1,31).reshape(3,10)

```
[array([[ 1, 2],
[11, 12],
[21, 22]]), array([[ 3, 4],
[13, 14],
[23, 24]]), array([[ 5, 6],
[15, 16],
[25, 26]]), array([[ 7, 8],
[17, 18],
[27, 28]]), array([[ 9, 10],
[19, 20],
[29, 30]])]
```

In the above command A gets split into 5 arrays of same shape.

To split after the third and the fifth column we write:

np.hsplit(A,(3,5))

```
[array([[ 1, 2, 3],
[11, 12, 13],
[21, 22, 23]]), array([[ 4, 5],
[14, 15],
[24, 25]]), array([[ 6, 7, 8, 9, 10],
[16, 17, 18, 19, 20],
[26, 27, 28, 29, 30]])]
```

**Copying**

Consider an array x

x = np.arange(1,16)

We assign y as x and then say ‘y is x’

y = x

Let us change the shape of y

y.shape = 3,5

Note that it alters the shape of x

`(3, 5)`

**Creating a view of the data**

Let us store z as a **view** of x by:

z = x.view()

`False`

Thus z is not x.

Changing the shape of z

z.shape = 5,3

Creating a view does not alter the shape of x

`(3, 5)`

Changing an element in z

z[0,0] = 1234

Note that the value in x also get alters:

```
array([[1234, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[ 11, 12, 13, 14, 15]])
```

Thus changes in the display does not hamper the original data but changes in values of view will affect the original data.

Creating a copy of the data:

Now let us create z as a copy of x:

z = x.copy()

Note that z is not x

Changing the value in z

z[0,0] = 9999

No alterations are made in x.

```
array([[1234, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10],
[ 11, 12, 13, 14, 15]])
```

Python sometimes may give ‘setting with copy’ warning because it is unable to recognize whether the new dataframe or array (created as a subset of another dataframe or array) is a view or a copy. Thus in such situations user needs to specify whether it is a copy or a view otherwise Python may hamper the results.

## Exercises : Numpy

### 1. How to extract even numbers from array?

arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

**Desired Output :**array([0, 2, 4, 6, 8])

arr[arr % 2 == 0]

### 2. How to find out the position where elements of x and y are same

x = np.array([5,6,7,8,3,4])

y = np.array([5,3,4,5,2,4])

**Desired Output :**array([0, 5]

np.where(x == y)

### 3. How to standardize values so that it lies between 0 and 1

k = np.array([5,3,4,5,2,4])

**Hint :**k-min(k)/(max(k)-min(k))

kmax, kmin = k.max(), k.min()

k_new = (k – kmin)/(kmax – kmin)

### 4. How to calculate the percentile scores of an array

p = np.array([15,10, 3,2,5,6,4])

np.percentile(p, q=[5, 95])

### 5. Print the number of missing values in an array

p = np.array([5,10, np.nan, 3, 2, 5, 6, np.nan])

print(“Number of missing values =”, np.isnan(p).sum())