python-course.eu

3. Numpy Data Objects, dtype

By Bernd Klein. Last modified: 07 May 2025.


dtype

Chapter: Data Type dtype in NumPy

NumPy, the fundamental package for numerical computing in Python, relies heavily on efficient storage and manipulation of data. At the heart of this efficiency is the concept of dtype—short for data type. Every NumPy array has a dtype that describes the type of elements it contains, such as integers, floating-point numbers, booleans, or even user-defined types.

Understanding dtype is critical not only for performance optimization but also for ensuring the correctness of computations. In this chapter, we explore how NumPy uses dtype to manage memory, how different data types behave, how to inspect and convert them, and how custom data types can be created for advanced use cases.

The data type object 'dtype' is an instance of numpy.dtype class. It can be created with numpy.dtype. We had already done this in the previous chapters of our Numpy tutorial:

import numpy as np

arr = np.array([1, 2, 3])
print(arr.dtype)

OUTPUT:

int64

We had also learned how to create arrays with a specific dtype. In the previous example, we let NumPy make the decision, and it chose int64. You are well advised to always create an array with a specific dtype to ensure consistency and portability. So the previous example should look like this:

arr = np.array([1, 2, 3], dtype=np.float32)
print(arr)
print(arr.dtype)

OUTPUT:

[1. 2. 3.]
float32

So far, we have used in our examples of NumPy arrays only fundamental numeric data types like int and float. These NumPy arrays contained solely homogeneous data types. dtype objects, however, can also be constructed by combining fundamental data types.

With the aid of dtype, we are capable of creating Structured Arrays—also known as Record Arrays. Structured arrays provide us with the ability to have different data types for different columns within a single array. This structure resembles that of an Excel spreadsheet or a CSV file, where each column can hold a different type of data.

This makes it possible to define and manage complex data like the one in the following table using a custom dtype:

Country Population Density Area Population
Netherlands 544 33720 18,346,819
Belgium 383 30510 11,700,000
United Kingdom 287 243610 69,800,000
Germany 241 348560 84,075,075
Liechtenstein 238 160 38,080
Italy 197 301230 59,400,000
Switzerland 219 41290 9,050,000
Luxembourg 253 2586 654,000
France 122 547030 66,700,000
Austria 109 83858 9,140,000
Greece 81 131940 10,700,000
Ireland 77 70280 5,400,000
Sweden 26 449964 10,300,000
Finland 18 338424 6,100,000
Norway 15 385252 5,800,000

Before we work with a complex data structure like the one shown above, let’s first introduce dtype using a very simple example. We define a data type based on int16 and refer to it as i16. (Admittedly, this isn’t a very descriptive name, but we’ll use it just for this example.) The elements of a list named lst are then converted to the i16 type to create a two-dimensional array called A.

import numpy as np

i16 = np.dtype(np.int16)
print(i16)

lst = [ [3.4, 8.7, 9.9], 
        [1.1, -7.8, -0.7],
        [4.1, 12.3, 4.8] ]

A = np.array(lst, dtype=i16)

print(A)

OUTPUT:

int16
[[ 3  8  9]
 [ 1 -7  0]
 [ 4 12  4]]

We introduced a new name for a basic data type in the previous example. This has nothing to do with the structured arrays, which we mentioned in the introduction of this chapter of our dtype tutorial.

This example shows how to create a NumPy array with a specific data type (int16), even when the original data contains floats.

It demonstrates that NumPy will convert (truncate) the float values to integers, helping you:

It's a simple but powerful way to learn how dtype influences what actually gets stored in a NumPy array.

Live Python training

instructor-led training course

Enjoying this page? We offer live Python training courses covering the content of this site.

See our Python training courses

See our Machine Learning with Python training courses

Structured Arrays

ndarrays are homogeneous data objects, i.e. all elements of an array have to be of the same data type. The data type dytpe on the other hand allows as to define separate data types for each column.

Now we will take the first step towards implementing the table with European countries and the information on population, area and population density. We create a structured array with the 'density' column. The data type is defined as np.dtype([('density', np.int)]). We assign this data type to the variable 'dt' for the sake of convenience. We use this data type in the darray definition, in which we use the first three densities.

import numpy as np

# Define a structured data type with one field: 'density' as int32
density_dtype = np.dtype([('density', np.int32)])

# Create a structured array using the custom dtype
densities = np.array([(393,), (337,), (256,)], dtype=density_dtype)

# Print the structured array
print("Structured array:")
print(densities)

OUTPUT:

Structured array:
[(393,) (337,) (256,)]

Let's have a look at the internal representation:

print("\nThe internal representation:")
print(repr(densities))

OUTPUT:

The internal representation:
array([(393,), (337,), (256,)], dtype=[('density', '<i4')])

We can access the content of the density column by indexing densities with the key 'density'. It looks like accessing a dictionary in Python:

print(densities['density'])

OUTPUT:

[393 337 256]

You may wonder why we used np.int32 in our dtype definition, yet the internal representation shows '<i4'.

This is because NumPy allows you to define data types in two equivalent ways:

  1. Using NumPy's explicit type objects like np.int32
  2. Using string codes like 'i4' (which stands for 4-byte integer)

So, we could have defined our dtype like this as well:

density_dtype = np.dtype([('density', 'i4')])

densities = np.array([(393,), (337,), (256,)],
                     dtype=density_dtype)
print(densities)

OUTPUT:

[(393,) (337,) (256,)]

The 'i' in 'i4' stands for integer, and the 4 means it occupies 4 bytes (32 bits).

But what about the less-than sign (<) you may have noticed in the internal representation, like '<i4'?

The < is a byte order indicator:

So technically, we could have written our dtype as:

# little-endian ordering
dt = np.dtype('<d')
print(dt.name, dt.byteorder, dt.itemsize)

# big-endian ordering
dt = np.dtype('>d')  
print(dt.name, dt.byteorder, dt.itemsize)

# native byte ordering
dt = np.dtype('d') 
print(dt.name, dt.byteorder, dt.itemsize)

OUTPUT:

float64 = 8
float64 > 8
float64 = 8

We can see the impact of byte order by defining a float64 (double precision) in different ways:

import numpy as np

# Native byte order (depends on system)
native = np.dtype('f8')

# Explicit little-endian
little_endian = np.dtype('<f8')

# Explicit big-endian
big_endian = np.dtype('>f8')

print("Native byte order dtype:", native)
print("Little-endian dtype:    ", little_endian)
print("Big-endian dtype:       ", big_endian)

OUTPUT:

Native byte order dtype: float64
Little-endian dtype:     float64
Big-endian dtype:        >f8

🧠 Understanding Tuples vs. Lists in Structured Arrays

Another detail in our earlier density array might seem confusing: we defined the array using a list of one-element tuples.
This might make you wonder: Can we use tuples and lists interchangeably in this context?

The answer is: not quite.

In NumPy structured arrays:

So, in our example, each tuple represents one country's data (like a row in a table), and the outer list is the collection of all those rows.

You could think of it like this:

Tuples define the structure of one unit of data;
Lists define the dimension or shape of the array.

Now, let’s extend our data structure to include more fields: country name, density, area, and population.

import numpy as np

# Define a structured dtype
dt = np.dtype([
    ('country', 'S20'), 
    ('density', 'i4'), 
    ('area', 'i4'), 
    ('population', 'i4')
])

population_table_2025 = np.array([
    ('Netherlands', 544, 33720, 18_346_819),
    ('Belgium', 383, 30510, 11_700_000),
    ('United Kingdom', 287, 243610, 69_800_000),
    ('Germany', 241, 348560, 84_075_075),
    ('Liechtenstein', 238, 160, 38_080),
    ('Italy', 197, 301230, 59_400_000),
    ('Switzerland', 219, 41290, 9_050_000),
    ('Luxembourg', 253, 2586, 654_000),
    ('France', 122, 547030, 66_700_000),
    ('Austria', 109, 83858, 9_140_000),
    ('Greece', 81, 131940, 10_700_000),
    ('Ireland', 77, 70280, 5_400_000),
    ('Sweden', 26, 449964, 10_300_000),
    ('Finland', 18, 338424, 6_100_000),
    ('Norway', 15, 385252, 5_800_000)
], dtype=dt)
# data 5th of May 2025, 
# based on Worldometer’s elaboration of the latest United Nations data

# Print the first 4 entries
print(population_table_2025[:4])

OUTPUT:

[(b'Netherlands', 544,  33720, 18346819)
 (b'Belgium', 383,  30510, 11700000)
 (b'United Kingdom', 287, 243610, 69800000)
 (b'Germany', 241, 348560, 84075075)]

We can acces every column individually:

print(population_table['density'])
print(population_table['country'])
print(population_table['area'][2:5])

OUTPUT:

[393 337 256 233 205 192 177 173 111  97  81  65  20  16  13]
[b'Netherlands' b'Belgium' b'United Kingdom' b'Germany' b'Liechtenstein'
 b'Italy' b'Switzerland' b'Luxembourg' b'France' b'Austria' b'Greece'
 b'Ireland' b'Sweden' b'Finland' b'Norway']
[243610 357021    160]

Live Python training

instructor-led training course

Enjoying this page? We offer live Python training courses covering the content of this site.

Upcoming online Courses

Python Intensive Course

23 Jun to 27 Jun 2025
28 Jul to 01 Aug 2025
08 Sep to 12 Sep 2025
20 Oct to 24 Oct 2025

Data Analysis with Python

04 Jun to 06 Jun 2025
30 Jul to 01 Aug 2025
10 Sep to 12 Sep 2025
22 Oct to 24 Oct 2025

Efficient Data Analysis with Pandas

02 Jun to 03 Jun 2025
23 Jun to 24 Jun 2025
28 Jul to 29 Jul 2025
08 Sep to 09 Sep 2025
20 Oct to 21 Oct 2025

Python Text Processing Course

04 Jun to 06 Jun 2025
10 Sep to 12 Sep 2025
22 Oct to 24 Oct 2025

See our Python training courses

See our Machine Learning with Python training courses

Unicode Strings in Array

Some may have noticed that the strings in our previous array have been prefixed with a lower case "b". This means that we have created binary strings with the definition "('country', 'S20')". To get unicode strings we exchange this with the definition "('country', 'U20')". We will redefine our population table now:

dt = np.dtype([('country', 'U20'), 
               ('density', 'i4'), 
               ('area', 'i4'), 
               ('population', 'i4')])


population_table_2025 = np.array([
    ('Netherlands', 544, 33720, 18_346_819),
    ('Belgium', 383, 30510, 11_700_000),
    ('United Kingdom', 287, 243610, 69_800_000),
    ('Germany', 241, 348560, 84_075_075),
    ('Liechtenstein', 238, 160, 38_080),
    ('Italy', 197, 301230, 59_400_000),
    ('Switzerland', 219, 41290, 9_050_000),
    ('Luxembourg', 253, 2586, 654_000),
    ('France', 122, 547030, 66_700_000),
    ('Austria', 109, 83858, 9_140_000),
    ('Greece', 81, 131940, 10_700_000),
    ('Ireland', 77, 70280, 5_400_000),
    ('Sweden', 26, 449964, 10_300_000),
    ('Finland', 18, 338424, 6_100_000),
    ('Norway', 15, 385252, 5_800_000)
], dtype=dt)

print(population_table_2025[:4])

OUTPUT:

[('Netherlands', 544,  33720, 18346819) ('Belgium', 383,  30510, 11700000)
 ('United Kingdom', 287, 243610, 69800000)
 ('Germany', 241, 348560, 84075075)]

Input and Output of Structured Arrays

In most applications it will be necessary to save the data from a program into a file. We will write our previously created "darray" to a file with the command savetxt. You will find a detailled introduction into this topic in our chapter Reading and Writing Data Files

np.savetxt("population_table_2025.csv",
           population_table_2025,
           fmt="%s;%d;%d;%d",           
           delimiter=";")

It is highly probable that you will need to read in the previously written file at a later date. This can be achieved with the function genfromtxt.

dt = np.dtype([('country', 'U20'), 
               ('density', 'i4'), 
               ('area', 'i4'), 
               ('population', 'i4')])

x = np.genfromtxt("population_table_2025.csv",
               dtype=dt,
               delimiter=";")

print(x)

OUTPUT:

[('Netherlands', 544,  33720, 18346819) ('Belgium', 383,  30510, 11700000)
 ('United Kingdom', 287, 243610, 69800000)
 ('Germany', 241, 348560, 84075075)
 ('Liechtenstein', 238,    160,    38080) ('Italy', 197, 301230, 59400000)
 ('Switzerland', 219,  41290,  9050000)
 ('Luxembourg', 253,   2586,   654000) ('France', 122, 547030, 66700000)
 ('Austria', 109,  83858,  9140000) ('Greece',  81, 131940, 10700000)
 ('Ireland',  77,  70280,  5400000) ('Sweden',  26, 449964, 10300000)
 ('Finland',  18, 338424,  6100000) ('Norway',  15, 385252,  5800000)]

Live Python training

instructor-led training course

Enjoying this page? We offer live Python training courses covering the content of this site.

See our Python training courses

See our Machine Learning with Python training courses

Operations

To demonstrate how structured NumPy arrays with the same shape and fields can be compared, we will create a new array containing the population data from 1995:

import numpy as np

# Define the structured dtype
dt = np.dtype([
    ('country', 'U20'),
    ('density', 'i4'),
    ('area', 'i4'),
    ('population', 'i4')
])

# 1995 population data
population_table_1995 = np.array([
    ('Netherlands', 462, 33720, 15_565_032),
    ('Belgium', 332, 30510, 10_137_265),
    ('United Kingdom', 239, 243610, 58_154_634),
    ('Germany', 235, 348560, 82_019_890),
    ('Liechtenstein', 193, 160, 30_886),
    ('Italy', 189, 301230, 56_885_126),
    ('Switzerland', 171, 41290, 7_040_477),
    ('Luxembourg', 158, 2586, 408_227),
    ('France', 106, 547030, 58_192_203),
    ('Austria', 95, 83858, 7_943_489),
    ('Greece', 80, 131940, 10_519_234),
    ('Ireland', 51, 70280, 3_610_697),
    ('Sweden', 20, 449964, 8_826_301),
    ('Finland', 15, 338424, 5_107_922),
    ('Norway', 11, 385252, 4_358_992)
], dtype=dt)

Let's check if the area is the same:

area_changed = population_table_1995['area'] != population_table_2025['area']
print(area_changed)

OUTPUT:

[False False False False False False False False False False False False
 False False False]

If you see only False values in the output, it means that the area has not changed for any of the countries — which is expected, as national land area typically remains stable over time.

However, if you were working with a much larger dataset, it would be impractical to inspect the results visually. In such cases, you can use the boolean array area_changed to filter and return only the countries where the area value differs. In our example, this returns an empty array, indicating that no country's area has changed:

print(population_table_2025['country'][area_changed])

OUTPUT:

[]

Much more interesting is the change in population between 1995 and 2025.

To check which countries have exchanged will have to return all the countries:

population_changed = population_table_1995['population'] != population_table_2025['population']
print(population_table_2025['country'][population_changed])

OUTPUT:

['Netherlands' 'Belgium' 'United Kingdom' 'Germany' 'Liechtenstein'
 'Italy' 'Switzerland' 'Luxembourg' 'France' 'Austria' 'Greece' 'Ireland'
 'Sweden' 'Finland' 'Norway']

And to inspect how much the population has changed, we can compute the difference:

population_diff = population_table_2025['population'] - population_table_1995['population']
print(population_diff)

OUTPUT:

[ 2781787  1562735 11645366  2055185     7194  2514874  2009523   245773
  8507797  1196511   180766  1789303  1473699   992078  1441008]

To view the countries alongside their population differences:

for country, diff in zip(population_table_2025['country'], population_diff):
    print(f"{country:20} {diff:>12,}")

OUTPUT:

Netherlands             2,781,787
Belgium                 1,562,735
United Kingdom         11,645,366
Germany                 2,055,185
Liechtenstein               7,194
Italy                   2,514,874
Switzerland             2,009,523
Luxembourg                245,773
France                  8,507,797
Austria                 1,196,511
Greece                    180,766
Ireland                 1,789,303
Sweden                  1,473,699
Finland                   992,078
Norway                  1,441,008
import numpy as np

# Define new dtype with an additional field for population difference
dt_with_diff = np.dtype([
    ('country', 'U20'),
    ('density', 'i4'),
    ('area', 'i4'),
    ('population', 'i4'),
    ('population_diff', 'i4')  # New field
])

# Compute the difference
population_diff = population_table_2025['population'] - population_table_1995['population']

# Build the new structured array
population_table_with_diff = np.array([
    (
        country,
        density,
        area,
        pop,
        diff
    )
    for country, density, area, pop, diff in zip(
        population_table_2025['country'],
        population_table_2025['density'],
        population_table_2025['area'],
        population_table_2025['population'],
        population_diff
    )
], dtype=dt_with_diff)

# Print result
population_table_with_diff

OUTPUT:

array([('Netherlands', 544,  33720, 18346819,  2781787),
       ('Belgium', 383,  30510, 11700000,  1562735),
       ('United Kingdom', 287, 243610, 69800000, 11645366),
       ('Germany', 241, 348560, 84075075,  2055185),
       ('Liechtenstein', 238,    160,    38080,     7194),
       ('Italy', 197, 301230, 59400000,  2514874),
       ('Switzerland', 219,  41290,  9050000,  2009523),
       ('Luxembourg', 253,   2586,   654000,   245773),
       ('France', 122, 547030, 66700000,  8507797),
       ('Austria', 109,  83858,  9140000,  1196511),
       ('Greece',  81, 131940, 10700000,   180766),
       ('Ireland',  77,  70280,  5400000,  1789303),
       ('Sweden',  26, 449964, 10300000,  1473699),
       ('Finland',  18, 338424,  6100000,   992078),
       ('Norway',  15, 385252,  5800000,  1441008)],
      dtype=[('country', '<U20'), ('density', '<i4'), ('area', '<i4'), ('population', '<i4'), ('population_diff', '<i4')])

Exercises:

Before you go on, you may take time to do some exercises to deepen the understanding of the previously learned stuff.

  1. Exercise:

    Define a structured array with two columns. The first column contains the product ID, which can be defined as an int32. The second column shall contain the price for the product. How can you print out the column with the product IDs, the first row and the price for the third article of this structured array?

  2. Exercise:

    Figure out a data type definition for time records with entries for hours, minutes and seconds.

  3. Exercise:

    Previously in ths chapter of our NumPy tutorial, we created a new structured NumPy array that included the absolute population growth from 1995 to 2025 by adding a new column. Now, extend this idea further: Create a new structured array based on the 2025 and 1995 data, and add two new fields:

    • One for the absolute growth in population (population_diff)
    • One for the percentage growth relative to 1995 (population_growth)

Live Python training

instructor-led training course

Enjoying this page? We offer live Python training courses covering the content of this site.

See our Python training courses

See our Machine Learning with Python training courses

Solutions:

Solution to the first exercise:

import numpy as np

mytype = [('productID', np.int32), ('price', np.float64)]

stock = np.array([(34765, 603.76), 
                  (45765, 439.93),
                  (99661, 344.19),
                  (12129, 129.39)], dtype=mytype)

print(stock[1])
print(stock["productID"])
print(stock[2]["price"])
print(stock)

OUTPUT:

(45765, 439.93)
[34765 45765 99661 12129]
344.19
[(34765, 603.76) (45765, 439.93) (99661, 344.19) (12129, 129.39)]

Solution to the second exercise:

A clock

time_type = np.dtype( [('h', int), ('min', int), ('sec', int)])

times = np.array([(11, 38, 5), 
                  (14, 56, 0),
                  (3, 9, 1)], dtype=time_type)
print(times)
print(times[0])
# reset the first time record:
times[0] = (11, 42, 17)
print(times[0])

OUTPUT:

[(11, 38, 5) (14, 56, 0) ( 3,  9, 1)]
(11, 38, 5)
(11, 42, 17)

We will increase the complexity of our previous example by adding temperatures to the records.

time_type = np.dtype( np.dtype([('time', [('h', int), 
                                          ('min', int), 
                                          ('sec', int)]),
                                ('temperature', float)] ))

times = np.array( [((11, 42, 17), 20.8), ((13, 19, 3), 23.2) ], dtype=time_type)
print(times)
print(times['time'])
print(times['time']['h'])
print(times['temperature'])

OUTPUT:

[((11, 42, 17), 20.8) ((13, 19,  3), 23.2)]
[(11, 42, 17) (13, 19,  3)]
[11 13]
[20.8 23.2]

Let's apply this to "real" data from a file:

This exercise should be closer to real life examples. Usually, we have to create or get the data for our structured array from some data base or file. We will use the list, which we have created in our chapter on file I/O File Management. The list has been saved with the aid of pickle.dump in the file cities_and_times.pkl.

So the first task consists in unpickling our data:

import pickle
fh = open("../data/cities_and_times.pkl", "br")
cities_and_times = pickle.load(fh)
print(cities_and_times[:30])

OUTPUT:

[('Amsterdam', 'Sun', (8, 52)), ('Anchorage', 'Sat', (23, 52)), ('Ankara', 'Sun', (10, 52)), ('Athens', 'Sun', (9, 52)), ('Atlanta', 'Sun', (2, 52)), ('Auckland', 'Sun', (20, 52)), ('Barcelona', 'Sun', (8, 52)), ('Beirut', 'Sun', (9, 52)), ('Berlin', 'Sun', (8, 52)), ('Boston', 'Sun', (2, 52)), ('Brasilia', 'Sun', (5, 52)), ('Brussels', 'Sun', (8, 52)), ('Bucharest', 'Sun', (9, 52)), ('Budapest', 'Sun', (8, 52)), ('Cairo', 'Sun', (9, 52)), ('Calgary', 'Sun', (1, 52)), ('Cape Town', 'Sun', (9, 52)), ('Casablanca', 'Sun', (7, 52)), ('Chicago', 'Sun', (1, 52)), ('Columbus', 'Sun', (2, 52)), ('Copenhagen', 'Sun', (8, 52)), ('Dallas', 'Sun', (1, 52)), ('Denver', 'Sun', (1, 52)), ('Detroit', 'Sun', (2, 52)), ('Dubai', 'Sun', (11, 52)), ('Dublin', 'Sun', (7, 52)), ('Edmonton', 'Sun', (1, 52)), ('Frankfurt', 'Sun', (8, 52)), ('Halifax', 'Sun', (3, 52)), ('Helsinki', 'Sun', (9, 52))]

Turning our data into a structured array:

time_type = np.dtype([('city', 'U30'), ('day', 'U3'), ('time', [('h', int), ('min', int)])])

times = np.array( cities_and_times , dtype=time_type)
print(times['time'])
print(times['city'])
x = times[27]
x[0]

OUTPUT:

[( 8, 52) (23, 52) (10, 52) ( 9, 52) ( 2, 52) (20, 52) ( 8, 52) ( 9, 52)
 ( 8, 52) ( 2, 52) ( 5, 52) ( 8, 52) ( 9, 52) ( 8, 52) ( 9, 52) ( 1, 52)
 ( 9, 52) ( 7, 52) ( 1, 52) ( 2, 52) ( 8, 52) ( 1, 52) ( 1, 52) ( 2, 52)
 (11, 52) ( 7, 52) ( 1, 52) ( 8, 52) ( 3, 52) ( 9, 52) ( 1, 52) ( 2, 52)
 (10, 52) ( 9, 52) ( 9, 52) (13, 37) (10, 52) ( 0, 52) ( 7, 52) ( 7, 52)
 ( 0, 52) ( 8, 52) (18, 52) ( 2, 52) ( 1, 52) ( 2, 52) (10, 52) ( 1, 52)
 ( 2, 52) ( 8, 52) ( 2, 52) ( 8, 52) ( 2, 52) ( 0, 52) ( 8, 52) ( 7, 52)
 (10, 52) ( 8, 52) ( 1, 52) ( 0, 52) ( 1, 52) ( 4, 52) ( 0, 52) (15, 52)
 (15, 52) ( 8, 52) (18, 52) ( 5, 52) (16, 52) ( 2, 52) ( 0, 52) ( 8, 52)
 ( 8, 52) ( 2, 52) ( 1, 52) ( 8, 52)]
['Amsterdam' 'Anchorage' 'Ankara' 'Athens' 'Atlanta' 'Auckland'
 'Barcelona' 'Beirut' 'Berlin' 'Boston' 'Brasilia' 'Brussels' 'Bucharest'
 'Budapest' 'Cairo' 'Calgary' 'Cape Town' 'Casablanca' 'Chicago'
 'Columbus' 'Copenhagen' 'Dallas' 'Denver' 'Detroit' 'Dubai' 'Dublin'
 'Edmonton' 'Frankfurt' 'Halifax' 'Helsinki' 'Houston' 'Indianapolis'
 'Istanbul' 'Jerusalem' 'Johannesburg' 'Kathmandu' 'Kuwait City'
 'Las Vegas' 'Lisbon' 'London' 'Los Angeles' 'Madrid' 'Melbourne' 'Miami'
 'Minneapolis' 'Montreal' 'Moscow' 'New Orleans' 'New York' 'Oslo'
 'Ottawa' 'Paris' 'Philadelphia' 'Phoenix' 'Prague' 'Reykjavik' 'Riyadh'
 'Rome' 'Salt Lake City' 'San Francisco' 'San Salvador' 'Santiago'
 'Seattle' 'Shanghai' 'Singapore' 'Stockholm' 'Sydney' 'São Paulo' 'Tokyo'
 'Toronto' 'Vancouver' 'Vienna' 'Warsaw' 'Washington DC' 'Winnipeg'
 'Zurich']
np.str_('Frankfurt')

Solution to the third exercise:

import numpy as np

# Define a structured dtype with both absolute and percentage difference fields
dt_with_diff_and_growth = np.dtype([
    ('country', 'U20'),
    ('density', 'i4'),
    ('area', 'i4'),
    ('population', 'i4'),
    ('population_diff', 'i4'),     # Absolute change
    ('population_growth', 'f4')    # Percent change
])

# Calculate values
population_1995 = population_table_1995['population']
population_2025 = population_table_2025['population']

population_diff = population_2025 - population_1995
population_growth = np.round((population_diff / population_1995) * 100, 1)  # in %

# Build the new structured array
population_table_enriched = np.array([
    (
        country,
        density,
        area,
        pop,
        diff,
        growth
    )
    for country, density, area, pop, diff, growth in zip(
        population_table_2025['country'],
        population_table_2025['density'],
        population_table_2025['area'],
        population_2025,
        population_diff,
        population_growth
    )
], dtype=dt_with_diff_and_growth)

# Preview the first few rows
print(population_table_enriched[:4])

OUTPUT:

[('Netherlands', 544,  33720, 18346819,  2781787, 17.9)
 ('Belgium', 383,  30510, 11700000,  1562735, 15.4)
 ('United Kingdom', 287, 243610, 69800000, 11645366, 20. )
 ('Germany', 241, 348560, 84075075,  2055185,  2.5)]

Live Python training

instructor-led training course

Enjoying this page? We offer live Python training courses covering the content of this site.

Upcoming online Courses

Python Intensive Course

23 Jun to 27 Jun 2025
28 Jul to 01 Aug 2025
08 Sep to 12 Sep 2025
20 Oct to 24 Oct 2025

Data Analysis with Python

04 Jun to 06 Jun 2025
30 Jul to 01 Aug 2025
10 Sep to 12 Sep 2025
22 Oct to 24 Oct 2025

Efficient Data Analysis with Pandas

02 Jun to 03 Jun 2025
23 Jun to 24 Jun 2025
28 Jul to 29 Jul 2025
08 Sep to 09 Sep 2025
20 Oct to 21 Oct 2025

Python Text Processing Course

04 Jun to 06 Jun 2025
10 Sep to 12 Sep 2025
22 Oct to 24 Oct 2025

See our Python training courses

See our Machine Learning with Python training courses