NumPy Arrays
- What is NumPy array?
- Indexes and slices
- Attributes of np.ndarray
- Reshaping arrays
- Operations with arrays
Introduction
NumPy (Numerical Python) is an open source Python library that’s widely used in science and engineering. It allows programmers to work with multidimensional array data structures, known as the homogeneous, N-dimensional arrays (or numpy.ndarray
), as well as with a large library of functions that operate efficiently on these data structures. It also became a dependance for a lot of packages in machine learning (like scikit-learn, TensorFlow, Keras), visualisations (Matplotlib, Seaborn, Altair), and many others. Long story short: if you will proceed with programming in Python, you most probably will meet numpy in your life.
Today, we’re going to learn basisc of numpy
and will try to understand why it became so popular.
Technically, you can do everything with Python alone. After all, we’re in programming: there’s no such thing as a single right solution.
However, in real-world, as data grows, computations become slow and inefficient. numpy
provides optimized, vectorized operations that leverage low-level C
implementations, making it the proper way to scale element-wise computations. Whether you’re handling large datasets, performing scientific calculations, or working in machine learning, you’ll get faster execution (in most of cases), lower memory usage, and cleaner code compared to native Python structures (src for example below).
import time # timings would be different on different machines
N = 1_000_000
# initialization
%time v1_l = [1 for i in range(N)] # 25.1 ms
%time v1_n = np.ones(N) # 0.8 ms
print('\n')
# square
%time v2_l = [i**2 for i in range(N)] # 38.7 ms
%time v2_n = np.array([i**2 for i in range(N)]) # 59.9 ms
print('\n')
# maximum
%time max(v1_l) # 6.44 ms
%time np.max(v1_n) # 0.17 ms
Additionally, it’s just more expressive. Can you agree that it is so natural to expect multipling all values in an array by 2 when trying to multiply an array by 2?
list_example = [1, 2, 3]
print(list_example + list_example) # [1, 2, 3, 1, 2, 3]
array_example = np.array([1, 2, 3])
print(array_example + array_example) # [2 4 6]
And remember, friends: the whole point of numpy is to avoid manual manipulation of individual elements in arrays. So, if you don’t want to face unexpected results:
DON’T USE NUMPY ARRAYS WITHIN LOOPS!
# dot production
N = 1_000_000
arr1 = np.random.rand(N)
arr2 = np.random.rand(N)
%time with_loops = [arr1[i]*arr2[i] for i in range(N)] # 124 ms
%time with_numpy = np.dot(arr1,arr2) # 0.4 ms
Basics
NumPy’s main object is the array that is multidimensional and homogeneous. Let’s decompose this definition:
array - in computer programming, an array is a structure for storing and retrieving data. We often talk about an array as if it were a grid in space, with each cell storing one element of the data. In Python, we call it sequence (but honestly these terms are interchangable). Previously covered datatypes, such as
list
andtuple
can also be considered as sequences (did you know that Python has its ownarray.array
type?..).multidimensional - the idea behind numpy arrays is generalized to an arbitrary number of dimensions.
homogeneous - all elements of the array must be of the same type of data.
# load the package (if ModuleNotFoundError: pip install numpy)
import numpy as np
# the most basic way to create an array is to pass a list
nothing = np.array([])
something = np.array([1, 2, 3, 4, 5]) # all elements are int
print(something, type(something)) # [1 2 3 4 5] <class 'numpy.ndarray'>
# note that there always has to be a *single* object within the np.array function
try:
np.array([1, 2, 3], [4, 5, 6]) # 2 lists
except TypeError as e:
print("Error:", e) # Error: Field elements must be 2- or 3-tuples, got '4'
Let’s stop on the multidimensionality. For example, picture below shows a model of 3D-array (just because we as a human beings are unable to visualise more than three dimensions):
- If it would be a 1D-array, we would only have dark elements of a single color (e.g.
000, 001, 002
, an array with 3 values), meaning that we can put them in a simple non-nested data container. - With 2D-array we will cover all values in blue square, making 3 arrays with 3 values each. Data tables are common example of two-dimensional arrays (but more on that later).
- By adding 3rd dimension, we’re expanding our model to an volumetric cube, as it now 3 arrays of 3 arrays of 3 values each.
- And we can add more and more dimensions…

(src: geeksforgeeks.org)
# 2D array (remember: we don't name variables with numbers!)
two_d_array = np.array(
[[1, 2, 3],
[4, 5, 6],
[7, 8, 9]]
)
print(two_d_array, '\n', type(two_d_array), '\n') # <class 'numpy.ndarray'>
# here and below '\n's are for better output readability
# 3D array
three_d_array = np.array(
[
[
[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12]
],
[
[13, 14, 15, 16],
[17, 18, 19, 20],
[21, 22, 23, 24]
]
]
)
print(three_d_array, '\n', type(three_d_array)) # <class 'numpy.ndarray'>
Sometimes we don’t have or need data to fill ndarray, but we have know what shape of an array we need. To create basic arrays with stated shape, we can use following functions:
# shape as an argument
zeros = np.zeros((2,2,2)) # [[[0. 0.]
print(f"Zeros array:\n{zeros}\n") # [0. 0.]]
# [0. 0.]
# [0. 0.]]]
ones = np.ones((4, 5)) # [[1. 1. 1. 1. 1.]
print(f"Ones array:\n{ones}\n") # [1. 1. 1. 1. 1.]
# [1. 1. 1. 1. 1.]
# [1. 1. 1. 1. 1.]]
# there's a predefined function for Pi!
filled = np.full((3,3), np.pi) # [[3.14159265 3.14159265 3.14159265]
print(f"Filled array with Pi:\n{filled}\n") # [3.14159265 3.14159265 3.14159265]
# [3.14159265 3.14159265 3.14159265]]
# sequence of numbers
arange_seq = np.arange(0, 10, 1) # start, stop, step (analogue of `range`)
print(f"Arange sequence:\n{arange_seq}\n") # [0 1 2 3 4 5 6 7 8 9]
linspace_seq = np.linspace(0, 1, 11) # start, stop, number of elements (steps = elements - 1)
print(f"Linspace sequence:\n{linspace_seq}\n") # [0. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1. ]
You can also copy existing arrays. Note that assignment doesn’t create a new array per se: it creates a new variable that points to alreeady existing object:
new_filled = filled # no new object is created
new_filled[0, 0] = 0 # change the first element in the new array
print(new_filled is filled) # True: two names for the same ndarray object, even though we changed the value
copy_arange_seq = arange_seq.copy() # new object is created
copy_arange_seq[0] = 1000 # change the first element in the new array
print(copy_arange_seq is arange_seq) # False: two different ndarray objects
And, of course, you can edit existing values by adding, removing, and sorting elements:
arange_seq_1 = np.arange(1, 10, 2) # [1 3 5 7 9]
arange_seq_2 = np.arange(0, 11, 2) # [0 2 4 6 8 10]
# concatenation - flattened
total_seq = np.concatenate((arange_seq_1, arange_seq_2))
print(f"Concatenated sequence:\n{total_seq}\n") # [1 3 5 7 9 0 2 4 6 8 10]
# dropping values
total_seq = np.delete(total_seq, -1) # drop the first element
print(f"Sequence with dropped last value:\n{total_seq}\n") # [1 3 5 7 9 0 2 4 6 8]
# sort sequence
sorted_seq = np.sort(total_seq) # different sorting funtions are available for different shapes of arrays
print(f"Sorted sequence:\n{sorted_seq}\n") # [0 1 2 3 4 5 6 7 8 9]
Indexes and slices
In general case, indexing with ndaarray
s works just like with (nested) lists:
# Indexing and slicing for 1D array works just like with lists:
arange_seq = np.arange(0, 10, 1) # [0 1 2 3 4 5 6 7 8 9]
print("First element:", arange_seq[0], end = '\n\n') # 0
print("Last element:", arange_seq[-1], end = '\n\n') # 9
print("Elements from 3rd to 5th:", # including both
arange_seq[2:5], end = '\n\n') # [2 3 4]
# Multidimensional arrays can have one index per axis:
two_d_array = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print("First row of 2D array <- [0]:", two_d_array[0], sep = '\n', end = '\n\n') # [1 2 3]
print("First element of the first row <-[0][0]:", two_d_array[0, 0], sep = '\n', end = '\n\n') # 1
# bonus: getting only last values
print(f"Last columns of 2D array:\n{two_d_array[..., -1]}") # [3 6 9]
You can also index with boolean array - usually it’s called mask or masking:
print(f'This is original array:\n{two_d_array}\n')
print(f'This is mask for even numbers:\n{two_d_array % 2 ==0 }\n')
# [[False True False]
# [ True False True]
# [False True False]]
boolen_mask = two_d_array % 2 == 0 # filtered out by boolean mask
print(f'This is filtered by mask array:\n{two_d_array[boolen_mask]}\n') # [2 4 6 8]
Attributes
Several attributes might be useful to get an information on current array’s dimensions, size, and data types within:
print(three_d_array, "\n")
# [[[ 1 2 3 4]
# [ 5 6 7 8]
# [ 9 10 11 12]]
# [[13 14 15 16]
# [17 18 19 20]
# [21 22 23 24]]]
three_d_shape = three_d_array.shape
print(".shape: Shape of the array -", three_d_shape) # (2, 3, 4) - this tupple means 2 matrices, 3 rows, 4 columns
three_d_ndim = three_d_array.ndim
print(".ndim: Number of dimensions -", three_d_ndim) # 3
three_d_size = three_d_array.size
print(".size: Total number of elements -", three_d_size) # 24 (rows * columns)
three_d_dtype = three_d_array.dtype
print(".dtype: Data type of elements -", three_d_dtype) # int64
three_d_itemsize = three_d_array.itemsize
print(".itemsize: Size of each element in bytes -", three_d_itemsize) # 8 bytes for int64
three_d_data = three_d_array.data
print(".data: Memory address of the array -", three_d_data) # <memory at 0x115d2a4d0>
Reshaping arrays
An important moment for algebra calculations is to be able to transform you array to match needs of intended operation. Using .reshape()
will give a new shape to an array without changing the data. Just remember that when you use the reshape method, the array you want to produce needs to have the same number of elements as the original array:
# as we know that three_d_array.size = 24:
three_to_two = three_d_array.reshape(6, 4) # reshaping to the 2D-array of (6, 4) shape
print("Reshaped 3d:\n", three_to_two, end='\n\n')
# also, you can Transpose arrays
print("Transposed reshaped 3d:\n", three_to_two.T)
Operations with arrays
When working with numpy arrays, it’s important to understand how operations differ from standard Python containers. Unlike lists, NumPy supports vectorized operations, meaning calculations on an ndarray apply element-wise without needing explicit loops. there are generally three categories of operations:
Aggregate operations (e.g.,
np.sum(array)
,np.mean(array)
) compute a single result from multiple elements. This category is especially useful to get descriptive statistics on the array.Element-wise operations (e.g.,
array + 2
,array * array2
) apply operations to each element individually.Pairwise operations (e.g.,
array1 + array2
) work when arrays have the same shape or are broadcastable.
# basic algebra
# addition
a = np.array([[1, 2], [3, 4]])
b = np.array([[5, 6], [7, 8]])
print("Addition:\n", a + b, end='\n\n') # [[ 6 8]
# [10 12]]
# subtraction
print("Subtraction:\n", a - b, end='\n\n') # [[-4 -4]
# [-4 -4]]
# multiplication (element-wise)
print("Multiplication:\n", a * b, end='\n\n') # [[ 5 12]
# [21 32]]
# aggregate operations
print(f"Sum of all values: {three_d_array.sum()}") # 300
print(f"Minimum value: {three_d_array.min()}") # 1
print(f"Maximum value: {three_d_array.max()}") # 24
print(f"Mean of values: {three_d_array.mean()}") # 12.5
print(f"Median of values: {np.median(three_d_array)}") # 12.5
print(f"Standard deviation of values: {three_d_array.std()}") # 6.922...
# bonus: sorting arrays
unsorted = np.array(
[[6, 5, 4],
[3, 2, 1]]
)
# Axis along which to sort. Default is -1, which means sort along the last axis.
print("Original array:")
print(unsorted)
unsorted.sort(axis=1)
print("\nArray sorted along the secomd axis (columns):")
print(unsorted)
# element-wise operations
print(f"Element-wise division by 10:\n{three_d_array / 10}\n") # [[[0.1, 0.2, 0.3,..
print(f"Element-wise addition by 10:\n{three_d_array + 10}\n") # [[[11, 12, 13,..
# pairwise operations
print(f"Pairwise multiplication:\n{three_d_array * three_d_array}\n") # [[[1, 4, 9,..
You can also perform matrix / dot production:

(src: algebra1course)
another = np.array([[1, 0],
[-1, 2],
[2, 1]])
print(f'Dot product of \n\n{two_d_array} \n\nand \n\n{another} \n\nis \n\n{two_d_array @ another}')
Final Notes
This notes should give you a basic understanding of possibilities with numpy
. It’s very important tool for any large-scale calculations, and it’s being used among different fiels.
Here, we covered only basics of np.ndarray, but there’s a lot more to learn about the package. For this purpose, I suggest you to visit extensive and well-written Numput doc website.