Spatial data visualization in python

Although it is much more convenient to use software dedicated for GIS, like ArcGIS or QGIS, for spatial data visualization, but ability to display spatial data within your code (especially if you are working with notebooks) might be very handy. Currently there are tens of geo-spatial python libraries, and here you can find a nice overview. I just took one, which seems to be relatively easy for simple map display use cases. My choice was Folium.

In most of the cases, functionality that I need is to display some vector data on the map or highlight some location with using marker, so let’s start from some simple task.

Titanic, this tragedy was investigated by almost each Data Scientist beginner (here you can find my Kaggle Titanic project). By now, it is clear, that the best chance to survive had kinds in the first class, but that is not a topic of this post. Here, we will display Titanic track before it sank.

In this example I will use vector data stored in GeoJSON format, which looks like this:

#GeoJSON data format
"""{ "type": "LineString", 
    "coordinates": [
        [-1.422394023261522,50.915326603354451],[-1.2069179833123727,50.761806565381434],...
    ...,[-47.474265452339353,44.197223433400623],[-49.93827045144009,41.749277801296536]
    ]
}"""
import folium
from folium import plugins
import pandas as pd
%matplotlib inline
#create interactive Leaflet maps object
map = folium.Map(location=[48, -25], tiles='Mapbox Bright', zoom_start=4)

#add vector  (in GeoJson format) data to the map 
map.add_child(folium.GeoJson(open('./data/titanic_test.json',encoding = "utf-8-sig").read()))

#add markers
folium.Marker([50.892239,-1.3981697], popup='Southampton<br><i>10 April 1912</i>').add_to(map)
folium.Marker([49.646042,-1.618031], popup='Cherbuorg<br><i>10 April 1912</i>').add_to(map)
folium.Marker([51.853955,-8.2997997], popup='Cobh<br><i>11 April 1912</i>').add_to(map)
folium.Marker(location=[41.726931,-49.948253], popup='Titanic Site<br><i>15 April 1912</i>'\
              ,icon=folium.Icon(color='red')).add_to(map)
map

Like I said, in most of the cases, functionality presented above is what I need, but just to have better feeling what we can do with Folium…
Air pollution, this is quite popular topic (probably because I live in one of the most polluted cities in Europe). Let’s see if that is true. I have downloaded air quality annual statistics from European Environment Agency (EEA) home page and I will display mean PM10 Benzo(a)pyrene 2015 volume in 2015 measured in multiple European cities.

## Load csv "The annual aggregated air quality values" to pandas dataframe
air_q_data =pd.read_csv('./data/data_air_q.csv', sep=',', header=0)
air_q_data.head()
CountryOrTerritory ReportingYear UpdateTime StationLocalId SamplingPointLocalId SamplingPoint_Latitude SamplingPoint_Longitude Pollutant AggregationType Namespace Unit BeginPosition EndPosition Validity Verification DataCoverage DataCapture TimeCoverage AQValue
0 Spain 2015 0001-01-01T00:00:00Z STA_ES1889A SP_37274011_1_38 40.960833 -5.639722 Sulphur dioxide (air) 1 year 50 percentile ES.BDCA.AQD ug.m-3 2015-01-01 2016-01-01 Not valid Verified 63.013699 100.0 63.013699 3.0
1 Spain 2015 0001-01-01T00:00:00Z STA_ES1424A SP_33004049_1_38 43.558850 -5.927480 Sulphur dioxide (air) 1 year 50 percentile ES.BDCA.AQD ug.m-3 2015-01-01 2016-01-01 Valid Verified 99.041096 100.0 99.041096 3.0
2 Spain 2015 0001-01-01T00:00:00Z STA_ES1193A SP_28079024_1_38 40.420000 -3.749167 Sulphur dioxide (air) 1 year 50 percentile ES.BDCA.AQD ug.m-3 2015-01-01 2016-01-01 Valid Verified 99.121005 100.0 99.121005 2.0
3 Spain 2015 0001-01-01T00:00:00Z STA_ES1696A SP_20069004_1_38 43.302778 -1.984444 Sulphur dioxide (air) 1 year 99.73 %ile of hourly values in a given … ES.BDCA.AQD ug.m-3 2015-01-01 2016-01-01 Valid Verified 95.662100 100.0 95.662100 19.0
4 Spain 2015 0001-01-01T00:00:00Z STA_ES1270A SP_33024023_1_38 43.535185 -5.658337 Sulphur dioxide (air) 1 year 99.73 %ile of hourly values in a given … ES.BDCA.AQD ug.m-3 2015-01-01 2016-01-01 Valid Verified 99.360731 100.0 99.360731 35.0
# Select only PM10 Benzo(a)pyrene 2015 value data for display
air_q_data_print =air_q_data[(air_q_data.ReportingYear==2015)&
                             (air_q_data.Pollutant=='Benzo(a)pyrene in PM10 (aerosol)')&\
                             (air_q_data.AggregationType=='Annual mean / 1 calendar year')]
# get station location, PM10 value and convert it to list
y=air_q_data_print['SamplingPoint_Latitude'].tolist()
x=air_q_data_print['SamplingPoint_Longitude'].tolist()
v=air_q_data_print['AQValue'].tolist()
t =list(zip(y,x,v))
#create interactive Leaflet maps object
map = folium.Map(location=[48, 8], tiles='Stamen Terrain', zoom_start=4)
#add HeatMap to map object
map.add_child(plugins.HeatMap(t, radius = 12))

Hmm, indeed it doesn’t look very optimistic. This large yellow area in Poland, this is where I live.

Python code and data set for this post might be downloaded from my github.