How to visualize spatial point-based data and its density with python Basemap?


Hi folks! This is my very first post on Medium since I’ve been realized that blogspot isn’t so-called popular anymore. In the past few months, I’ve got several messages from my college peers who have read my thesis and felt kind of amazed by my satellite data visualization (unbelievable). So I tried to just give them some raw scripts and succinctly deciphered how’re the scripts work. It was a struggle for me to recall those meteorological terms and such. Right after that, I felt why don’t I try to recollect and enshrine my college meteorological-related works on Medium? Because I think, it’ll be like an atonement for myself to do so since I’ve “abandoned” my meteorology diploma and let the knowledge evaporated from my brain offhand for my dream job.

Let’s start the study!

  1. What data is used?

Mine was output from GTG (Grab ’Em Tag ’Em Graph ’Em) data. It contains a specific location of cloud centroid which is detected by several weather satellites as black body temperature (source: NASA). The location is specified by its coordinates (latitude and longitude).

Fig 1. The Data

Actually, it isn’t an ordinary detected cloud, it was MCC (Mesoscale Convective Complex), which is a cloud organization that has an area of more than 50,000 km2, eccentricity of more than 0.7, and a life span of more than 6 hours (Maddox, 1980). The GTG algorithm yielded the MCC characteristics such as start time, end time, duration, average area, maximum area, a time when the cloud reaches its maximum area, eccentricity, latitude, and longitude.

As long we aim to focus on plotting spatial-based data, let’s just see to the latitude and longitude columns. The columns aren’t sorted by anything or dependent on any other columns, they were merely a simple coordinate.

2. Import library and read data

I use xlrd library instead of pandas because back to the time I first created the script, I wasn’t acquainted with pandas yet. I suggest using pandas instead of xlrd because pandas has way more powerful features and the data will be more legible because pandas works with daraframe.

Installing Basemap library is a bit tricky. Sometimes it couldn’t simply as type pip install -package name- . It was intricate so I’m not elaborate it here, I assumed you have successfully installed all the required libraries as shown above.

My excel file name is dataMCC.xlsx and I only use the first sheet, so it’s indexed by 0. Lat and Lon array variables stand for latitude and longitude which located in the 11th and 12th columns of the sheet.

3. Create 2D array of density

Fig. 2. Point to Density Illustration

As the aforementioned statement in point 1, the Lat and Lon variables aren’t sorted by anything or dependent on any other columns. So if we plotted it to the map directly, we’d still get a rightly plotted point. But how if we want to get the density of it?

In this case, density is the number of points (MCC’s centroid) per region of desired resolution (Lat x Lon). First, calculate the desired resolution.

This part is strictly dependent on the region of the map that we want to show. My case study is in the Maritime Continent which located between 90E–150E and 15S-15N. I want to map my data in 2.5 x 2.5-degree resolution, so nx = 24 (150–90/24 = 2.5), and ny = 12 ((15-(-15)/12)=2.5). (Note: if you didn’t know the region that you want to show on the map, you could define it by each maximum and minimum value of latitude and longitude).

I use numpy.histogram2d for density calculation. It simply calculated the number of samples in each bin (read: numpy.histogram2d). So now we have a density matrix and make sure it has the same dimension as Lat x Lon matrix.

4. Declare Basemap parameters

It must be the simplest one, I use Mercator as my map projection and I declare max and min of latitude and longitude. For the best map resolution, you can set ‘f’ (full resolution) on the resolution argument. See Basemap tutorial here.

5. Draw the map!

Standard Basemap map settings are required before the map is drawn. Parallels and meridians are variables that use to determine the scale of the grid line to be drawn.

I use imshow because it has many interpolation methods as an argument. Imshow displays data on a 2D regular raster image. Since I wanted to contour fill the density with a color, I had to make the 2D matrix of density smooth without diminishing any data so I picked ‘spline36’ as the best interpolation method to interpret the density (read: Imshow documentation).

Vmin and Vmax are the minimum and maximum value of density matrix, it is important for ranging the color gradation within the map and the color bar. The MCC’s centroids (which is what the spatial point-based means) displayed as red dots.

And here we go, the final result:

Fig 3. Final Visualization Result

I know for some people, it might be a piece of cake. However, in Meteorological Science, visualization technique is an important key for interpreting meteorological phenomena and elaborating their impact to people. It can’t be erroneously displayed. Meteorologists should understand the data from the get-go, where’s the data stem from? what’s in the data? what are we gonna show?

Hope it useful!



Atmospheric Data Science Enthusiast | Data Analyst in Fintech.