NumPy Arrays

Introduction

NumPy (Numerical Python) is an open source Python library that’s widely used in science and engineering. It allows programmers to work with multidimensional array data structures, known as the homogeneous, N-dimensional arrays (or numpy.ndarray), as well as with a large library of functions that operate efficiently on these data structures. It also became a dependance for a lot of packages in machine learning (like scikit-learn, TensorFlow, Keras), visualisations (Matplotlib, Seaborn, Altair), and many others. Long story short: if you will proceed with programming in Python, you most probably will meet numpy in your life.

Today, we’re going to learn basisc of numpy and will try to understand why it became so popular.

Technically, you can do everything with Python alone. After all, we’re in programming: there’s no such thing as a single right solution.

However, in real-world, as data grows, computations become slow and inefficient. numpy provides optimized, vectorized operations that leverage low-level C implementations, making it the proper way to scale element-wise computations. Whether you’re handling large datasets, performing scientific calculations, or working in machine learning, you’ll get faster execution (in most of cases), lower memory usage, and cleaner code compared to native Python structures (src for example below).

import time     # timings would be different on different machines
N = 1_000_000

# initialization
%time v1_l = [1 for i in range(N)]                  # 25.1 ms
%time v1_n = np.ones(N)                             # 0.8 ms    
print('\n')

# square 
%time v2_l = [i**2 for i in range(N)]               # 38.7 ms
%time v2_n = np.array([i**2 for i in range(N)])     # 59.9 ms
print('\n')

# maximum
%time max(v1_l)                                     # 6.44 ms
%time np.max(v1_n)                                  # 0.17 ms

Additionally, it’s just more expressive. Can you agree that it is so natural to expect multipling all values in an array by 2 when trying to multiply an array by 2?

list_example = [1, 2, 3]
print(list_example + list_example)      # [1, 2, 3, 1, 2, 3]

array_example = np.array([1, 2, 3])
print(array_example + array_example)    # [2 4 6]

And remember, friends: the whole point of numpy is to avoid manual manipulation of individual elements in arrays. So, if you don’t want to face unexpected results:

DON’T USE NUMPY ARRAYS WITHIN LOOPS!

# dot production 
N = 1_000_000
arr1 = np.random.rand(N)
arr2 = np.random.rand(N)

%time with_loops = [arr1[i]*arr2[i] for i in range(N)]           # 124 ms
%time with_numpy = np.dot(arr1,arr2)                             # 0.4 ms   
Basics

NumPy’s main object is the array that is multidimensional and homogeneous. Let’s decompose this definition:

# load the package (if ModuleNotFoundError: pip install numpy)
import numpy as np

# the most basic way to create an array is to pass a list
nothing = np.array([])
something = np.array([1, 2, 3, 4, 5])   # all elements are int
print(something, type(something))       # [1 2 3 4 5] <class 'numpy.ndarray'>

# note that there always has to be a *single* object within the np.array function
try:
    np.array([1, 2, 3], [4, 5, 6])      # 2 lists
except TypeError as e:
    print("Error:", e)                  # Error: Field elements must be 2- or 3-tuples, got '4'

Let’s stop on the multidimensionality. For example, picture below shows a model of 3D-array (just because we as a human beings are unable to visualise more than three dimensions):

3d_array

(src: geeksforgeeks.org)

# 2D array (remember: we don't name variables with numbers!)
two_d_array = np.array(
    [[1, 2, 3], 
     [4, 5, 6], 
     [7, 8, 9]]
    )
print(two_d_array, '\n', type(two_d_array), '\n') #  <class 'numpy.ndarray'> 
# here and below '\n's are for better output readability

# 3D array 
three_d_array = np.array(
    [
        [
            [1, 2, 3, 4],
            [5, 6, 7, 8],
            [9, 10, 11, 12]
        ],
        [
            [13, 14, 15, 16],
            [17, 18, 19, 20],
            [21, 22, 23, 24]
        ]
    ]
    )
print(three_d_array, '\n', type(three_d_array)) #  <class 'numpy.ndarray'> 

Sometimes we don’t have or need data to fill ndarray, but we have know what shape of an array we need. To create basic arrays with stated shape, we can use following functions:

# shape as an argument

zeros = np.zeros((2,2,2))           # [[[0. 0.]
print(f"Zeros array:\n{zeros}\n")   #   [0. 0.]]
                                    #   [0. 0.]
                                    #   [0. 0.]]]

ones = np.ones((4, 5))              # [[1. 1. 1. 1. 1.]
print(f"Ones array:\n{ones}\n")     # [1. 1. 1. 1. 1.]
                                    # [1. 1. 1. 1. 1.]
                                    # [1. 1. 1. 1. 1.]]

# there's a predefined function for Pi!
filled = np.full((3,3), np.pi)              # [[3.14159265 3.14159265 3.14159265]
print(f"Filled array with Pi:\n{filled}\n") # [3.14159265 3.14159265 3.14159265]
                                            # [3.14159265 3.14159265 3.14159265]]

# sequence of numbers

arange_seq = np.arange(0, 10, 1)                # start, stop, step (analogue of `range`)
print(f"Arange sequence:\n{arange_seq}\n")      # [0 1 2 3 4 5 6 7 8 9]

linspace_seq = np.linspace(0, 1, 11)            # start, stop, number of elements (steps = elements - 1)
print(f"Linspace sequence:\n{linspace_seq}\n")  # [0.  0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. ]

You can also copy existing arrays. Note that assignment doesn’t create a new array per se: it creates a new variable that points to alreeady existing object:

new_filled = filled         # no new object is created
new_filled[0, 0] = 0        # change the first element in the new array
print(new_filled is filled) # True: two names for the same ndarray object, even though we changed the value

copy_arange_seq = arange_seq.copy()     # new object is created 
copy_arange_seq[0] = 1000              # change the first element in the new array
print(copy_arange_seq is arange_seq)    # False: two different ndarray objects

And, of course, you can edit existing values by adding, removing, and sorting elements:

arange_seq_1 = np.arange(1, 10, 2)                # [1 3 5 7 9]
arange_seq_2 = np.arange(0, 11, 2)                # [0 2 4 6 8 10]

# concatenation - flattened
total_seq = np.concatenate((arange_seq_1, arange_seq_2)) 
print(f"Concatenated sequence:\n{total_seq}\n")             # [1 3 5 7 9 0 2 4 6 8 10]

# dropping values 
total_seq = np.delete(total_seq, -1) # drop the first element
print(f"Sequence with dropped last value:\n{total_seq}\n")  # [1 3 5 7 9 0 2 4 6 8]

# sort sequence
sorted_seq = np.sort(total_seq)  # different sorting funtions are available for different shapes of arrays
print(f"Sorted sequence:\n{sorted_seq}\n")                  # [0 1 2 3 4 5 6 7 8 9]
Indexes and slices

In general case, indexing with ndaarrays works just like with (nested) lists:

# Indexing and slicing for 1D array works just like with lists:
arange_seq = np.arange(0, 10, 1)                        # [0 1 2 3 4 5 6 7 8 9]    
print("First element:", arange_seq[0], end = '\n\n')    # 0
print("Last element:", arange_seq[-1], end = '\n\n')    # 9
print("Elements from 3rd to 5th:",                      # including both
      arange_seq[2:5], end = '\n\n')                    # [2 3 4]

# Multidimensional arrays can have one index per axis:
two_d_array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("First row of 2D array <- [0]:", two_d_array[0], sep = '\n', end = '\n\n')                 # [1 2 3]
print("First element of the first row <-[0][0]:", two_d_array[0, 0], sep = '\n', end = '\n\n')   # 1

# bonus: getting only last values
print(f"Last columns of 2D array:\n{two_d_array[..., -1]}")     # [3 6 9]

You can also index with boolean array - usually it’s called mask or masking:

print(f'This is original array:\n{two_d_array}\n')
print(f'This is mask for even numbers:\n{two_d_array % 2 ==0 }\n') 
# [[False  True False]
#  [ True False  True]
#  [False  True False]]


boolen_mask = two_d_array % 2 == 0    # filtered out by boolean mask
print(f'This is filtered by mask array:\n{two_d_array[boolen_mask]}\n') # [2 4 6 8] 
Attributes

Several attributes might be useful to get an information on current array’s dimensions, size, and data types within:

print(three_d_array, "\n")
# [[[ 1  2  3  4]
#   [ 5  6  7  8]
#   [ 9 10 11 12]]

#  [[13 14 15 16]
#   [17 18 19 20]
#   [21 22 23 24]]] 

three_d_shape = three_d_array.shape
print(".shape: Shape of the array -", three_d_shape)        # (2, 3, 4) - this tupple means 2 matrices, 3 rows, 4 columns

three_d_ndim = three_d_array.ndim
print(".ndim: Number of dimensions -", three_d_ndim)        # 3

three_d_size = three_d_array.size
print(".size: Total number of elements -", three_d_size)    # 24 (rows * columns)

three_d_dtype = three_d_array.dtype
print(".dtype: Data type of elements -", three_d_dtype)     # int64

three_d_itemsize = three_d_array.itemsize
print(".itemsize: Size of each element in bytes -", three_d_itemsize) # 8 bytes for int64

three_d_data = three_d_array.data
print(".data: Memory address of the array -", three_d_data)           # <memory at 0x115d2a4d0>
Reshaping arrays

An important moment for algebra calculations is to be able to transform you array to match needs of intended operation. Using .reshape() will give a new shape to an array without changing the data. Just remember that when you use the reshape method, the array you want to produce needs to have the same number of elements as the original array:

# as we know that three_d_array.size = 24:
three_to_two = three_d_array.reshape(6, 4) # reshaping to the 2D-array of (6, 4) shape
print("Reshaped 3d:\n", three_to_two, end='\n\n')

# also, you can Transpose arrays
print("Transposed reshaped 3d:\n", three_to_two.T)
Operations with arrays

When working with numpy arrays, it’s important to understand how operations differ from standard Python containers. Unlike lists, NumPy supports vectorized operations, meaning calculations on an ndarray apply element-wise without needing explicit loops. there are generally three categories of operations:

# basic algebra
# addition
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])
print("Addition:\n", a + b, end='\n\n')     # [[ 6  8]
                                            #  [10 12]]

# subtraction
print("Subtraction:\n", a - b, end='\n\n')  # [[-4 -4]
                                            #  [-4 -4]]

# multiplication (element-wise)
print("Multiplication:\n", a * b, end='\n\n')   # [[ 5 12]
                                                #  [21 32]]
# aggregate operations
print(f"Sum of all values: {three_d_array.sum()}")  # 300
print(f"Minimum value: {three_d_array.min()}")      # 1
print(f"Maximum value: {three_d_array.max()}")      # 24

print(f"Mean of values: {three_d_array.mean()}")                # 12.5
print(f"Median of values: {np.median(three_d_array)}")          # 12.5 
print(f"Standard deviation of values: {three_d_array.std()}")   # 6.922...
# bonus: sorting arrays
unsorted = np.array(
    [[6, 5, 4], 
     [3, 2, 1]]
)

# Axis along which to sort. Default is -1, which means sort along the last axis.
print("Original array:")
print(unsorted)

unsorted.sort(axis=1) 
print("\nArray sorted along the secomd axis (columns):")
print(unsorted)
# element-wise operations
print(f"Element-wise division by 10:\n{three_d_array / 10}\n") # [[[0.1, 0.2, 0.3,..
print(f"Element-wise addition by 10:\n{three_d_array + 10}\n") # [[[11, 12, 13,..

# pairwise operations
print(f"Pairwise multiplication:\n{three_d_array * three_d_array}\n") # [[[1, 4, 9,..

You can also perform matrix / dot production:

(src: algebra1course)

another = np.array([[1, 0],
                    [-1, 2],
                    [2, 1]])

print(f'Dot product of \n\n{two_d_array} \n\nand \n\n{another} \n\nis \n\n{two_d_array @ another}')
Final Notes

This notes should give you a basic understanding of possibilities with numpy. It’s very important tool for any large-scale calculations, and it’s being used among different fiels.

Here, we covered only basics of np.ndarray, but there’s a lot more to learn about the package. For this purpose, I suggest you to visit extensive and well-written Numput doc website.