Plotting European capitals, centroids and the distance between them using geopandas

Does the capital of a country tend to be geographically close to the center? (Answer - not really.)

Context

I was curious about how far the capital city tends to be from the geographical 'center' of countries. Here in the UK, our capital (London) is firmly in the south-east and as a result the south-east region is by far the most prosperous and economically active. This got me thinking about whether a similar thing is true in other countries, and whether their capitals are more 'central' or are the majority of their populations similarly distant from the capital?

I've used geopandas and a small amount of shapely and matplotlib to plot a map of the EU, EEA (Norway, Iceland and Liechtenstein), Switzerland and the UK (32 countries in total) -- showing location of their capital city along with the geographical 'centroid' and connecting those visually to show the distance more clearly.

Concepts covered in this walkthrough:

  • Read a GeoJSON file into a geopandas data frame and manipulate its POLYGON and MULTIPOLYGON content
  • Calculate centroids ('center' of a geographical shape) using geopandas using different geographic projections
  • Plot country shape file on a map
  • Plot geographic points on a map
  • Draw LINESTRING lines connecting points in a geopandas data frame and plot them on a map

Final result:

Reference data

First I've created a dictionary with the countries, their capital city and the geographical coordinates (latitude and longitude) of the capital. All of the coordinates were obtained via Wikipedia and sense-checked by viewing on a map.

The ordering of the countries seems a little bit arbitrary, as I have taken it from my previous post on visualising geographical data on a tilemap and the ordering is broadly left to right, top to bottom when viewed on a grid.

Importing and manipulating the country shape data (GeoJSON) and calculating the centroids

Here I import the data from the Country Polygons as GeoJSON data package at Datahub. Using geopandas read_file to consume the results of a HTTP request directly, I extract just the data for the required countries (the reference dataset is for all countries in the world), set its index to the name of the country, and set the geometry column.

Some countries, such as the Netherlands, have a number of overseas territories etc that are geographically very far from the 'mainland' of the country. It wouldn't make sense to include those in calculating a centroid of the country since it would result in a location somewhere between the two, probably in the middle of the sea! So I've carried out some data wrangling to include only a subset of the areas in these cases (no political statement should be inferred from this).

Using the at method on a data frame, I set the geography to the subset of MULTIPOLYGON data using e.g. eu_and_eea.at['Netherlands', 'shape'] = eu_and_eea.at['Netherlands', 'shape'][3:12]. The explode() method shows the contents of the MULTIPOLYGON as its constituent parts, allowing the list to be subsetted in this way.

Last in this section, I use the .centroid calculation on a geoseries to find the center spot of the (adjusted if necessary) shape of the country. Here I need to make use of to_crs() and from_crs() to re-project the data between coordinate systems, in order to calculate the centroids correctly (taking into account the curvature of the earth).

Combining the two datasets and creating a linestring between the capital and the centroid

With the capitals data as a dictionary, this is now imported into a pandas dataframe. Geopandas points_from_xy() is then used to combine the raw latitude and longitude data into a POINT type representing the capital.

I then join the two datasets (country shapes and the capitals) into all_data, and create a new column with a LineString (really just a series of points that make up a line) drawn between the capital and the centroid.

Drawing the map

Now all the preparatory steps are in place I'm ready to draw the map! Geopandas has its own plot() methods, which are calls to matplotlib under the hood. plot() can be called on multiple types of data directly as a column in a geo-dataframe, including the country shapes (polygons and multipolygons), lines and points.

Below I plot the following, setting the z-order to ensure that they appear in the correct "layered" order as I wanted the line to sit behind the points (which it doesn't naturally do even when giving the plot() calls in the 'correct' sequence).

  • country shapes
  • centroids with square marker and color
  • capitals with square marker and color

Then we need to call set_geometry() on the dataframe, to choose the LineString column (created above) as the geometry context, before plotting the lines.

I then add some legend and title decoration and turn off the axis for the plot (the 'x' and 'y' axes are the coordinates but it doesn't add anything to keep them here).

For the annotations (text for the capital city name) this was generally straightforward with the exception of Vienna (Austria) and Bratislava (Slovakia) which are so close together, I had to manipulate the placement of their legends slightly so they didn't overlap. (Airlines have already discovered this, and there are flights into 'Vienna' which actually land at Bratislava, which is only about 30 miles away. I suppose this isn't much different from 'London' Luton Airport which is a similar distance from London!)

Interestingly many of the capital cities are far from the center of the country. For some countries (Norway, Sweden and Finland) this seems to make sense, since their northern parts are so remote, that the 'hub' of the country would be closer to the rest of Europe. Vilnius (Lithuania) is near the border with Belarus. London is one of the "furthest" proportional to the size of the country.

Jupyter notebook

The complete Jupyter notebook for the above can be found here (Github Gist).