Context
Today I'm visualising data by country for the EU and EEA using a 'tilemap' layout to represent each country as its own square in a grid (similar to this map of the USA).
Of course, due to the difference in sizes between countries and the way they don't sit naturally in a grid-like arrangement, this is a bit of interpretation and subjectivity as to how to fit them onto a grid!
After some experimentation I ended up with a 10x10 grid, with each cell either empty or containing a country. The countries I used were the 27 EU countries plus Iceland, Liechtenstein and Norway (EEA countries that are not part of the EU), Switzerland (part of the single market but not in the EU or EEA) and the UK. I worked this out mainly on paper and then translated into code.
Below is an example of a resulting plot. Clearly there are some liberties taken here with geography compared to an actual map, although the countries are generally in the right relationship to each other.
Reference data
I created a dictionary keyed by country names and containing the coordinates on a 10x10 grid along with the 2-letter abbreviation for the country to show on the resulting plot.
Constructing the plot
I created a csv file with the relevant data, using headers for country and for the 'value' field. Since this is not an exercise in data wrangling I created the files in a clean format knowing that the countries relate to the countries in my reference data without any 'cleansing' needed. Obviously with most datasets there will be some kind of pre-processing required as it is quite unusual to be given data in exactly the state it's needed...
The first dataset is length of waterways in kilometres by country. Data is sourced from Wikipedia and used under a Creative Commons Attribution-ShareAlike Licence 3.0.
In the first part of the code I:
- Download the csv data, read it into a
pandas
DataFrame
and determine the minimum and maximum values in the data (to be used later in applying a color scale). - Create a "grid" (
numpy
array) representing a 10x10 layout, and pre-fill it withnp.nan
values. These default values will be overwritten with a data value, if there is a country appearing in that cell. - Loop over the countries, adding their data values to the "grid" where applicable (i.e. in the spot for that country). There may not be data available for all countries, so in that case I add a value that is "minimum of the data - 1" into the grid. This is so that I can handle "data not available" cases separately from cells where there is no country plotted.
In the code below, I'm making use of a custom colormap (code included in the full notebook at the bottom) but also show how an inbuilt colormap could be used.
from matplotlib import pyplot as plt
import matplotlib as mpl
import numpy as np
import pandas as pd
import requests
import io
# Download data and read into a dataframe
f = requests.get('https://gist.githubusercontent.com/catslovedata/316cddf9a4069ff4d2abdef1502ce99e/raw/90527f4d52a79ae2882500ea3ef707441022a21d/20230114B.csv')
df = pd.read_csv(io.BytesIO(f.content), sep=',', index_col='country')
# Store off min and max
d_min, d_max = min(df['waterways_length']), max(df['waterways_length'])
# Populate grid to hold the heatmap
a = np.full([10,10,], (np.nan))
for k, v in countries.items():
  a[(v[0],v[1])] = df._get_value(k, 'waterways_length') if k in df.index else d_min - 1  # -1 to make it lower than the data minimum so we can use set_under()
# Create the visuals
cmap = custom_cmap.copy()
# cmap = mpl.cm.get_cmap('viridis').copy() # if not using a custom cmap
cmap.set_under('#cccccc', alpha=0.2) # Values outside the stated range of the cmap will be shown as this
Now I can plot the data as a "heatmap" using imshow
. The plot is fed with the data min and max values, and applies the color map accordingly. I also add a color bar, and a text string showing the name (2-letter abbreviation) for the country.
plt.figure(figsize=(8,8))
im = plt.imshow(data, cmap=cmap, interpolation=None, vmin=d_min, vmax=d_max)
plt.colorbar(im)
# Now add the text
for k, v in countries.items():
  plt.text(v[1],v[0],v[2],horizontalalignment='center', verticalalignment='center', color='white' if k in df.index else '#999', fontsize=12, fontname='Open Sans', fontweight='400', alpha=0.9)
plt.axis(False)
plt.title('Countries by total waterways length (km)\n(source: https://en.wikipedia.org/wiki/List_of_countries_by_waterways_length)', y=1.2, fontsize=20, fontname='Open Sans', fontweight='300', color='#222222', alpha=0.9)
... resulting in a plot as below.
Similarly, I've plotted some data on number of millionaires as a percentage of the population. Data is sourced from Wikipedia and used under a Creative Commons Attribution-ShareAlike Licence 3.0.
This example also shows a data value, as well as the label, being plotted.
The available figures were more patchy here, so there are more countries showing 'no data available'.
Jupyter notebook
The complete Jupyter notebook for the above can be found here (Github Gist).