3. Numpy Data Objects, dtype
By Bernd Klein. Last modified: 07 May 2025.
dtype
Chapter: Data Type dtype
in NumPy
NumPy, the fundamental package for numerical computing in Python, relies heavily on efficient storage and manipulation of data. At the heart of this efficiency is the concept of dtype
—short for data type. Every NumPy array has a dtype
that describes the type of elements it contains, such as integers, floating-point numbers, booleans, or even user-defined types.
Understanding dtype
is critical not only for performance optimization but also for ensuring the correctness of computations. In this chapter, we explore how NumPy uses dtype
to manage memory, how different data types behave, how to inspect and convert them, and how custom data types can be created for advanced use cases.
The data type object 'dtype' is an instance of numpy.dtype class. It can be created with numpy.dtype. We had already done this in the previous chapters of our Numpy tutorial:
import numpy as np
arr = np.array([1, 2, 3])
print(arr.dtype)
OUTPUT:
int64
We had also learned how to create arrays with a specific dtype
. In the previous example, we let NumPy make the decision, and it chose int64
. You are well advised to always create an array with a specific dtype
to ensure consistency and portability. So the previous example should look like this:
arr = np.array([1, 2, 3], dtype=np.float32)
print(arr)
print(arr.dtype)
OUTPUT:
[1. 2. 3.] float32
So far, we have used in our examples of NumPy arrays only fundamental numeric data types like int
and float
. These NumPy arrays contained solely homogeneous data types. dtype
objects, however, can also be constructed by combining fundamental data types.
With the aid of dtype
, we are capable of creating Structured Arrays—also known as Record Arrays. Structured arrays provide us with the ability to have different data types for different columns within a single array. This structure resembles that of an Excel spreadsheet or a CSV file, where each column can hold a different type of data.
This makes it possible to define and manage complex data like the one in the following table using a custom dtype
:
Country | Population Density | Area | Population |
---|---|---|---|
Netherlands | 544 | 33720 | 18,346,819 |
Belgium | 383 | 30510 | 11,700,000 |
United Kingdom | 287 | 243610 | 69,800,000 |
Germany | 241 | 348560 | 84,075,075 |
Liechtenstein | 238 | 160 | 38,080 |
Italy | 197 | 301230 | 59,400,000 |
Switzerland | 219 | 41290 | 9,050,000 |
Luxembourg | 253 | 2586 | 654,000 |
France | 122 | 547030 | 66,700,000 |
Austria | 109 | 83858 | 9,140,000 |
Greece | 81 | 131940 | 10,700,000 |
Ireland | 77 | 70280 | 5,400,000 |
Sweden | 26 | 449964 | 10,300,000 |
Finland | 18 | 338424 | 6,100,000 |
Norway | 15 | 385252 | 5,800,000 |
Before we work with a complex data structure like the one shown above, let’s first introduce dtype
using a very simple example. We define a data type based on int16
and refer to it as i16
. (Admittedly, this isn’t a very descriptive name, but we’ll use it just for this example.) The elements of a list named lst
are then converted to the i16
type to create a two-dimensional array called A
.
import numpy as np
i16 = np.dtype(np.int16)
print(i16)
lst = [ [3.4, 8.7, 9.9],
[1.1, -7.8, -0.7],
[4.1, 12.3, 4.8] ]
A = np.array(lst, dtype=i16)
print(A)
OUTPUT:
int16 [[ 3 8 9] [ 1 -7 0] [ 4 12 4]]
We introduced a new name for a basic data type in the previous example. This has nothing to do with the structured arrays, which we mentioned in the introduction of this chapter of our dtype tutorial.
This example shows how to create a NumPy array with a specific data type (int16
), even when the original data contains floats.
It demonstrates that NumPy will convert (truncate) the float values to integers, helping you:
- Control memory usage
- Understand how
dtype
affects data representation - See how data types impact array behavior
It's a simple but powerful way to learn how dtype
influences what actually gets stored in a NumPy array.
Live Python training
See our Python training courses
Structured Arrays
ndarrays are homogeneous data objects, i.e. all elements of an array have to be of the same data type. The data type dytpe on the other hand allows as to define separate data types for each column.
Now we will take the first step towards implementing the table with European countries and the information on population, area and population density.
We create a structured array with the 'density' column. The data type is defined as np.dtype([('density', np.int)])
. We assign this data type to the variable 'dt' for the sake of convenience. We use this data type in the darray definition, in which we use the first three densities.
import numpy as np
# Define a structured data type with one field: 'density' as int32
density_dtype = np.dtype([('density', np.int32)])
# Create a structured array using the custom dtype
densities = np.array([(393,), (337,), (256,)], dtype=density_dtype)
# Print the structured array
print("Structured array:")
print(densities)
OUTPUT:
Structured array: [(393,) (337,) (256,)]
Let's have a look at the internal representation:
print("\nThe internal representation:")
print(repr(densities))
OUTPUT:
The internal representation: array([(393,), (337,), (256,)], dtype=[('density', '<i4')])
We can access the content of the density column by indexing densities
with the key 'density'. It looks like accessing a dictionary in Python:
print(densities['density'])
OUTPUT:
[393 337 256]
You may wonder why we used np.int32
in our dtype
definition, yet the internal representation shows '<i4'
.
This is because NumPy allows you to define data types in two equivalent ways:
- Using NumPy's explicit type objects like
np.int32
- Using string codes like
'i4'
(which stands for 4-byte integer)
So, we could have defined our dtype
like this as well:
density_dtype = np.dtype([('density', 'i4')])
densities = np.array([(393,), (337,), (256,)],
dtype=density_dtype)
print(densities)
OUTPUT:
[(393,) (337,) (256,)]
The 'i'
in 'i4'
stands for integer, and the 4
means it occupies 4 bytes (32 bits).
But what about the less-than sign (<
) you may have noticed in the internal representation, like '<i4'
?
The <
is a byte order indicator:
<
means little-endian encoding (least significant byte first)>
means big-endian encoding (most significant byte first)- No prefix means native byte order, which depends on your machine architecture (typically little-endian on most systems)
So technically, we could have written our dtype
as:
# little-endian ordering
dt = np.dtype('<d')
print(dt.name, dt.byteorder, dt.itemsize)
# big-endian ordering
dt = np.dtype('>d')
print(dt.name, dt.byteorder, dt.itemsize)
# native byte ordering
dt = np.dtype('d')
print(dt.name, dt.byteorder, dt.itemsize)
OUTPUT:
float64 = 8 float64 > 8 float64 = 8
We can see the impact of byte order by defining a float64 (double precision) in different ways:
import numpy as np
# Native byte order (depends on system)
native = np.dtype('f8')
# Explicit little-endian
little_endian = np.dtype('<f8')
# Explicit big-endian
big_endian = np.dtype('>f8')
print("Native byte order dtype:", native)
print("Little-endian dtype: ", little_endian)
print("Big-endian dtype: ", big_endian)
OUTPUT:
Native byte order dtype: float64 Little-endian dtype: float64 Big-endian dtype: >f8
🧠 Understanding Tuples vs. Lists in Structured Arrays
Another detail in our earlier density
array might seem confusing: we defined the array using a list of one-element tuples.
This might make you wonder: Can we use tuples and lists interchangeably in this context?
The answer is: not quite.
In NumPy structured arrays:
- Tuples are used to define individual records — that is, the atomic elements that match the structure (
dtype
). - Lists serve as the container that holds multiple records — they define the array’s shape or dimension.
So, in our example, each tuple represents one country's data (like a row in a table), and the outer list is the collection of all those rows.
You could think of it like this:
Tuples define the structure of one unit of data;
Lists define the dimension or shape of the array.
Now, let’s extend our data structure to include more fields: country name, density, area, and population.
import numpy as np
# Define a structured dtype
dt = np.dtype([
('country', 'S20'),
('density', 'i4'),
('area', 'i4'),
('population', 'i4')
])
population_table_2025 = np.array([
('Netherlands', 544, 33720, 18_346_819),
('Belgium', 383, 30510, 11_700_000),
('United Kingdom', 287, 243610, 69_800_000),
('Germany', 241, 348560, 84_075_075),
('Liechtenstein', 238, 160, 38_080),
('Italy', 197, 301230, 59_400_000),
('Switzerland', 219, 41290, 9_050_000),
('Luxembourg', 253, 2586, 654_000),
('France', 122, 547030, 66_700_000),
('Austria', 109, 83858, 9_140_000),
('Greece', 81, 131940, 10_700_000),
('Ireland', 77, 70280, 5_400_000),
('Sweden', 26, 449964, 10_300_000),
('Finland', 18, 338424, 6_100_000),
('Norway', 15, 385252, 5_800_000)
], dtype=dt)
# data 5th of May 2025,
# based on Worldometer’s elaboration of the latest United Nations data
# Print the first 4 entries
print(population_table_2025[:4])
OUTPUT:
[(b'Netherlands', 544, 33720, 18346819) (b'Belgium', 383, 30510, 11700000) (b'United Kingdom', 287, 243610, 69800000) (b'Germany', 241, 348560, 84075075)]
We can acces every column individually:
print(population_table['density'])
print(population_table['country'])
print(population_table['area'][2:5])
OUTPUT:
[393 337 256 233 205 192 177 173 111 97 81 65 20 16 13] [b'Netherlands' b'Belgium' b'United Kingdom' b'Germany' b'Liechtenstein' b'Italy' b'Switzerland' b'Luxembourg' b'France' b'Austria' b'Greece' b'Ireland' b'Sweden' b'Finland' b'Norway'] [243610 357021 160]
Live Python training
Upcoming online Courses
23 Jun to 27 Jun 2025
28 Jul to 01 Aug 2025
08 Sep to 12 Sep 2025
20 Oct to 24 Oct 2025
04 Jun to 06 Jun 2025
30 Jul to 01 Aug 2025
10 Sep to 12 Sep 2025
22 Oct to 24 Oct 2025
Efficient Data Analysis with Pandas
02 Jun to 03 Jun 2025
23 Jun to 24 Jun 2025
28 Jul to 29 Jul 2025
08 Sep to 09 Sep 2025
20 Oct to 21 Oct 2025
04 Jun to 06 Jun 2025
10 Sep to 12 Sep 2025
22 Oct to 24 Oct 2025
See our Python training courses
Unicode Strings in Array
Some may have noticed that the strings in our previous array have been prefixed with a lower case "b". This means that we have created binary strings with the definition "('country', 'S20')". To get unicode strings we exchange this with the definition "('country', 'U20')". We will redefine our population table now:
dt = np.dtype([('country', 'U20'),
('density', 'i4'),
('area', 'i4'),
('population', 'i4')])
population_table_2025 = np.array([
('Netherlands', 544, 33720, 18_346_819),
('Belgium', 383, 30510, 11_700_000),
('United Kingdom', 287, 243610, 69_800_000),
('Germany', 241, 348560, 84_075_075),
('Liechtenstein', 238, 160, 38_080),
('Italy', 197, 301230, 59_400_000),
('Switzerland', 219, 41290, 9_050_000),
('Luxembourg', 253, 2586, 654_000),
('France', 122, 547030, 66_700_000),
('Austria', 109, 83858, 9_140_000),
('Greece', 81, 131940, 10_700_000),
('Ireland', 77, 70280, 5_400_000),
('Sweden', 26, 449964, 10_300_000),
('Finland', 18, 338424, 6_100_000),
('Norway', 15, 385252, 5_800_000)
], dtype=dt)
print(population_table_2025[:4])
OUTPUT:
[('Netherlands', 544, 33720, 18346819) ('Belgium', 383, 30510, 11700000) ('United Kingdom', 287, 243610, 69800000) ('Germany', 241, 348560, 84075075)]
Input and Output of Structured Arrays
In most applications it will be necessary to save the data from a program into a file. We will write our previously created "darray" to a file with the command savetxt. You will find a detailled introduction into this topic in our chapter Reading and Writing Data Files
np.savetxt("population_table_2025.csv",
population_table_2025,
fmt="%s;%d;%d;%d",
delimiter=";")
It is highly probable that you will need to read in the previously written file at a later date. This can be achieved with the function genfromtxt.
dt = np.dtype([('country', 'U20'),
('density', 'i4'),
('area', 'i4'),
('population', 'i4')])
x = np.genfromtxt("population_table_2025.csv",
dtype=dt,
delimiter=";")
print(x)
OUTPUT:
[('Netherlands', 544, 33720, 18346819) ('Belgium', 383, 30510, 11700000) ('United Kingdom', 287, 243610, 69800000) ('Germany', 241, 348560, 84075075) ('Liechtenstein', 238, 160, 38080) ('Italy', 197, 301230, 59400000) ('Switzerland', 219, 41290, 9050000) ('Luxembourg', 253, 2586, 654000) ('France', 122, 547030, 66700000) ('Austria', 109, 83858, 9140000) ('Greece', 81, 131940, 10700000) ('Ireland', 77, 70280, 5400000) ('Sweden', 26, 449964, 10300000) ('Finland', 18, 338424, 6100000) ('Norway', 15, 385252, 5800000)]
Live Python training
See our Python training courses
Operations
To demonstrate how structured NumPy arrays with the same shape and fields can be compared, we will create a new array containing the population data from 1995:
import numpy as np
# Define the structured dtype
dt = np.dtype([
('country', 'U20'),
('density', 'i4'),
('area', 'i4'),
('population', 'i4')
])
# 1995 population data
population_table_1995 = np.array([
('Netherlands', 462, 33720, 15_565_032),
('Belgium', 332, 30510, 10_137_265),
('United Kingdom', 239, 243610, 58_154_634),
('Germany', 235, 348560, 82_019_890),
('Liechtenstein', 193, 160, 30_886),
('Italy', 189, 301230, 56_885_126),
('Switzerland', 171, 41290, 7_040_477),
('Luxembourg', 158, 2586, 408_227),
('France', 106, 547030, 58_192_203),
('Austria', 95, 83858, 7_943_489),
('Greece', 80, 131940, 10_519_234),
('Ireland', 51, 70280, 3_610_697),
('Sweden', 20, 449964, 8_826_301),
('Finland', 15, 338424, 5_107_922),
('Norway', 11, 385252, 4_358_992)
], dtype=dt)
Let's check if the area is the same:
area_changed = population_table_1995['area'] != population_table_2025['area']
print(area_changed)
OUTPUT:
[False False False False False False False False False False False False False False False]
If you see only False
values in the output, it means that the area has not changed for any of the countries — which is expected, as national land area typically remains stable over time.
However, if you were working with a much larger dataset, it would be impractical to inspect the results visually. In such cases, you can use the boolean array area_changed
to filter and return only the countries where the area value differs. In our example, this returns an empty array, indicating that no country's area has changed:
print(population_table_2025['country'][area_changed])
OUTPUT:
[]
Much more interesting is the change in population between 1995 and 2025.
To check which countries have exchanged will have to return all the countries:
population_changed = population_table_1995['population'] != population_table_2025['population']
print(population_table_2025['country'][population_changed])
OUTPUT:
['Netherlands' 'Belgium' 'United Kingdom' 'Germany' 'Liechtenstein' 'Italy' 'Switzerland' 'Luxembourg' 'France' 'Austria' 'Greece' 'Ireland' 'Sweden' 'Finland' 'Norway']
And to inspect how much the population has changed, we can compute the difference:
population_diff = population_table_2025['population'] - population_table_1995['population']
print(population_diff)
OUTPUT:
[ 2781787 1562735 11645366 2055185 7194 2514874 2009523 245773 8507797 1196511 180766 1789303 1473699 992078 1441008]
To view the countries alongside their population differences:
for country, diff in zip(population_table_2025['country'], population_diff):
print(f"{country:20} {diff:>12,}")
OUTPUT:
Netherlands 2,781,787 Belgium 1,562,735 United Kingdom 11,645,366 Germany 2,055,185 Liechtenstein 7,194 Italy 2,514,874 Switzerland 2,009,523 Luxembourg 245,773 France 8,507,797 Austria 1,196,511 Greece 180,766 Ireland 1,789,303 Sweden 1,473,699 Finland 992,078 Norway 1,441,008
import numpy as np
# Define new dtype with an additional field for population difference
dt_with_diff = np.dtype([
('country', 'U20'),
('density', 'i4'),
('area', 'i4'),
('population', 'i4'),
('population_diff', 'i4') # New field
])
# Compute the difference
population_diff = population_table_2025['population'] - population_table_1995['population']
# Build the new structured array
population_table_with_diff = np.array([
(
country,
density,
area,
pop,
diff
)
for country, density, area, pop, diff in zip(
population_table_2025['country'],
population_table_2025['density'],
population_table_2025['area'],
population_table_2025['population'],
population_diff
)
], dtype=dt_with_diff)
# Print result
population_table_with_diff
OUTPUT:
array([('Netherlands', 544, 33720, 18346819, 2781787), ('Belgium', 383, 30510, 11700000, 1562735), ('United Kingdom', 287, 243610, 69800000, 11645366), ('Germany', 241, 348560, 84075075, 2055185), ('Liechtenstein', 238, 160, 38080, 7194), ('Italy', 197, 301230, 59400000, 2514874), ('Switzerland', 219, 41290, 9050000, 2009523), ('Luxembourg', 253, 2586, 654000, 245773), ('France', 122, 547030, 66700000, 8507797), ('Austria', 109, 83858, 9140000, 1196511), ('Greece', 81, 131940, 10700000, 180766), ('Ireland', 77, 70280, 5400000, 1789303), ('Sweden', 26, 449964, 10300000, 1473699), ('Finland', 18, 338424, 6100000, 992078), ('Norway', 15, 385252, 5800000, 1441008)], dtype=[('country', '<U20'), ('density', '<i4'), ('area', '<i4'), ('population', '<i4'), ('population_diff', '<i4')])
Exercises:
Before you go on, you may take time to do some exercises to deepen the understanding of the previously learned stuff.
-
Exercise:
Define a structured array with two columns. The first column contains the product ID, which can be defined as an int32. The second column shall contain the price for the product. How can you print out the column with the product IDs, the first row and the price for the third article of this structured array?
-
Exercise:
Figure out a data type definition for time records with entries for hours, minutes and seconds.
-
Exercise:
Previously in ths chapter of our NumPy tutorial, we created a new structured NumPy array that included the absolute population growth from 1995 to 2025 by adding a new column. Now, extend this idea further: Create a new structured array based on the 2025 and 1995 data, and add two new fields:
- One for the absolute growth in population (
population_diff
) - One for the percentage growth relative to 1995 (
population_growth
)
- One for the absolute growth in population (
Live Python training
See our Python training courses
Solutions:
Solution to the first exercise:
import numpy as np
mytype = [('productID', np.int32), ('price', np.float64)]
stock = np.array([(34765, 603.76),
(45765, 439.93),
(99661, 344.19),
(12129, 129.39)], dtype=mytype)
print(stock[1])
print(stock["productID"])
print(stock[2]["price"])
print(stock)
OUTPUT:
(45765, 439.93) [34765 45765 99661 12129] 344.19 [(34765, 603.76) (45765, 439.93) (99661, 344.19) (12129, 129.39)]
Solution to the second exercise:
time_type = np.dtype( [('h', int), ('min', int), ('sec', int)])
times = np.array([(11, 38, 5),
(14, 56, 0),
(3, 9, 1)], dtype=time_type)
print(times)
print(times[0])
# reset the first time record:
times[0] = (11, 42, 17)
print(times[0])
OUTPUT:
[(11, 38, 5) (14, 56, 0) ( 3, 9, 1)] (11, 38, 5) (11, 42, 17)
We will increase the complexity of our previous example by adding temperatures to the records.
time_type = np.dtype( np.dtype([('time', [('h', int),
('min', int),
('sec', int)]),
('temperature', float)] ))
times = np.array( [((11, 42, 17), 20.8), ((13, 19, 3), 23.2) ], dtype=time_type)
print(times)
print(times['time'])
print(times['time']['h'])
print(times['temperature'])
OUTPUT:
[((11, 42, 17), 20.8) ((13, 19, 3), 23.2)] [(11, 42, 17) (13, 19, 3)] [11 13] [20.8 23.2]
Let's apply this to "real" data from a file:
This exercise should be closer to real life examples. Usually, we have to create or get the data for our structured array from some data base or file. We will use the list, which we have created in our chapter on file I/O File Management. The list has been saved with the aid of pickle.dump in the file cities_and_times.pkl.
So the first task consists in unpickling our data:
import pickle
fh = open("../data/cities_and_times.pkl", "br")
cities_and_times = pickle.load(fh)
print(cities_and_times[:30])
OUTPUT:
[('Amsterdam', 'Sun', (8, 52)), ('Anchorage', 'Sat', (23, 52)), ('Ankara', 'Sun', (10, 52)), ('Athens', 'Sun', (9, 52)), ('Atlanta', 'Sun', (2, 52)), ('Auckland', 'Sun', (20, 52)), ('Barcelona', 'Sun', (8, 52)), ('Beirut', 'Sun', (9, 52)), ('Berlin', 'Sun', (8, 52)), ('Boston', 'Sun', (2, 52)), ('Brasilia', 'Sun', (5, 52)), ('Brussels', 'Sun', (8, 52)), ('Bucharest', 'Sun', (9, 52)), ('Budapest', 'Sun', (8, 52)), ('Cairo', 'Sun', (9, 52)), ('Calgary', 'Sun', (1, 52)), ('Cape Town', 'Sun', (9, 52)), ('Casablanca', 'Sun', (7, 52)), ('Chicago', 'Sun', (1, 52)), ('Columbus', 'Sun', (2, 52)), ('Copenhagen', 'Sun', (8, 52)), ('Dallas', 'Sun', (1, 52)), ('Denver', 'Sun', (1, 52)), ('Detroit', 'Sun', (2, 52)), ('Dubai', 'Sun', (11, 52)), ('Dublin', 'Sun', (7, 52)), ('Edmonton', 'Sun', (1, 52)), ('Frankfurt', 'Sun', (8, 52)), ('Halifax', 'Sun', (3, 52)), ('Helsinki', 'Sun', (9, 52))]
Turning our data into a structured array:
time_type = np.dtype([('city', 'U30'), ('day', 'U3'), ('time', [('h', int), ('min', int)])])
times = np.array( cities_and_times , dtype=time_type)
print(times['time'])
print(times['city'])
x = times[27]
x[0]
OUTPUT:
[( 8, 52) (23, 52) (10, 52) ( 9, 52) ( 2, 52) (20, 52) ( 8, 52) ( 9, 52) ( 8, 52) ( 2, 52) ( 5, 52) ( 8, 52) ( 9, 52) ( 8, 52) ( 9, 52) ( 1, 52) ( 9, 52) ( 7, 52) ( 1, 52) ( 2, 52) ( 8, 52) ( 1, 52) ( 1, 52) ( 2, 52) (11, 52) ( 7, 52) ( 1, 52) ( 8, 52) ( 3, 52) ( 9, 52) ( 1, 52) ( 2, 52) (10, 52) ( 9, 52) ( 9, 52) (13, 37) (10, 52) ( 0, 52) ( 7, 52) ( 7, 52) ( 0, 52) ( 8, 52) (18, 52) ( 2, 52) ( 1, 52) ( 2, 52) (10, 52) ( 1, 52) ( 2, 52) ( 8, 52) ( 2, 52) ( 8, 52) ( 2, 52) ( 0, 52) ( 8, 52) ( 7, 52) (10, 52) ( 8, 52) ( 1, 52) ( 0, 52) ( 1, 52) ( 4, 52) ( 0, 52) (15, 52) (15, 52) ( 8, 52) (18, 52) ( 5, 52) (16, 52) ( 2, 52) ( 0, 52) ( 8, 52) ( 8, 52) ( 2, 52) ( 1, 52) ( 8, 52)] ['Amsterdam' 'Anchorage' 'Ankara' 'Athens' 'Atlanta' 'Auckland' 'Barcelona' 'Beirut' 'Berlin' 'Boston' 'Brasilia' 'Brussels' 'Bucharest' 'Budapest' 'Cairo' 'Calgary' 'Cape Town' 'Casablanca' 'Chicago' 'Columbus' 'Copenhagen' 'Dallas' 'Denver' 'Detroit' 'Dubai' 'Dublin' 'Edmonton' 'Frankfurt' 'Halifax' 'Helsinki' 'Houston' 'Indianapolis' 'Istanbul' 'Jerusalem' 'Johannesburg' 'Kathmandu' 'Kuwait City' 'Las Vegas' 'Lisbon' 'London' 'Los Angeles' 'Madrid' 'Melbourne' 'Miami' 'Minneapolis' 'Montreal' 'Moscow' 'New Orleans' 'New York' 'Oslo' 'Ottawa' 'Paris' 'Philadelphia' 'Phoenix' 'Prague' 'Reykjavik' 'Riyadh' 'Rome' 'Salt Lake City' 'San Francisco' 'San Salvador' 'Santiago' 'Seattle' 'Shanghai' 'Singapore' 'Stockholm' 'Sydney' 'São Paulo' 'Tokyo' 'Toronto' 'Vancouver' 'Vienna' 'Warsaw' 'Washington DC' 'Winnipeg' 'Zurich'] np.str_('Frankfurt')
Solution to the third exercise:
import numpy as np
# Define a structured dtype with both absolute and percentage difference fields
dt_with_diff_and_growth = np.dtype([
('country', 'U20'),
('density', 'i4'),
('area', 'i4'),
('population', 'i4'),
('population_diff', 'i4'), # Absolute change
('population_growth', 'f4') # Percent change
])
# Calculate values
population_1995 = population_table_1995['population']
population_2025 = population_table_2025['population']
population_diff = population_2025 - population_1995
population_growth = np.round((population_diff / population_1995) * 100, 1) # in %
# Build the new structured array
population_table_enriched = np.array([
(
country,
density,
area,
pop,
diff,
growth
)
for country, density, area, pop, diff, growth in zip(
population_table_2025['country'],
population_table_2025['density'],
population_table_2025['area'],
population_2025,
population_diff,
population_growth
)
], dtype=dt_with_diff_and_growth)
# Preview the first few rows
print(population_table_enriched[:4])
OUTPUT:
[('Netherlands', 544, 33720, 18346819, 2781787, 17.9) ('Belgium', 383, 30510, 11700000, 1562735, 15.4) ('United Kingdom', 287, 243610, 69800000, 11645366, 20. ) ('Germany', 241, 348560, 84075075, 2055185, 2.5)]
Live Python training
Upcoming online Courses
23 Jun to 27 Jun 2025
28 Jul to 01 Aug 2025
08 Sep to 12 Sep 2025
20 Oct to 24 Oct 2025
04 Jun to 06 Jun 2025
30 Jul to 01 Aug 2025
10 Sep to 12 Sep 2025
22 Oct to 24 Oct 2025
Efficient Data Analysis with Pandas
02 Jun to 03 Jun 2025
23 Jun to 24 Jun 2025
28 Jul to 29 Jul 2025
08 Sep to 09 Sep 2025
20 Oct to 21 Oct 2025
04 Jun to 06 Jun 2025
10 Sep to 12 Sep 2025
22 Oct to 24 Oct 2025
See our Python training courses