Extract seasonality patterns from climate data with Fast Fourier Transform (FFT)

Tio Faizin
7 min readSep 10, 2020

Meteorology students hardly experience smooth and expeditious data analysis. When it comes to results, they oftentimes plunge to nebulous insights and vague conclusions. Despite how clearly the method used, meteorology students shouldn’t take the idea of being creative in handling the data for granted. The quality of results totally depends on how creative they scramble the data and extract its insights. Hence the better result quality comes the more beneficial explanation. For the practical instance, when harnessing the time series meteorological dataset on analysis, many students trivialize the existence of another possible domain on the dataset. Many students don’t treat the time series data as a signal or wave that related to many parameters such as frequency and period. In this post, I’m going to show an uncomplicated example on how we use other possible domain of time series data -which is frequency- on analyzing meteorology phenomenon encompassing its practical step-by-step methods using python programming language.

Prepare the data

First things first, prepare the data that going to be analyzed. For example, I use 37-years daily precipitation from CHIRPS which is rainfall estimates from rain gauge and satellite observations. The data has been cropped at the specific BMKG Station in Bandung, named Cemara Weather Station. Since CHIRPS data more complete than BMKG data, I personally prefer to use CHIRPS rather than use BMKG data despite the considerable discrepancy in the data when compared.

Fig 1. Precipitation 1981–2018 (BMKG Cemara, Bandung)

Fig 1 shows the daily precipitation (mm) from 1st January of 1981 until 31st December of 2018. We might already have known that Bandung or any other region located on Java island has two seasons (dry and rainy). The peak of rainy season in Bandung normally happens in December-January-February (DJF) and the trough of rainy season -which also called dry season- in Bandung normally happens in June-July-August. Much of research proved that seasonality weather in Bandung (and mostly western Indonesia) is affected by Asia-Australia monsoon. As we know the trend of rain seasonality, we suppose to be able to depict it as a wave or signal since it has the peak and trough. For the better overview, instead of figure the data as a panel fraught of raw points connected to each other, let’s turn it to monthly precipitation.

Fig 2. Precipitation — Monthly Average (Stasiun Cemara, Bandung)

Now we have a better overview of rain seasonality in Bandung. This wavy pattern transpires one time a year and some years may have shifted of peak and trough of the rainy season. Notwithstanding, the pattern remains.

How if there were any else possibly periodically pattern occur in a year? Semiannual pattern or quarter-annually pattern? It could be and it should have been known by meteorology students as they recognize that rain pattern in Indonesia is very complex and affected by up to global-scale events and influenced by various topographic features.

Use FFT!

Fast Fourier Transform (FFT) is one of the most useful tools and widely used in signal processing. FFT has contributed to many problems solving observations in astronomy, physics, and chemistry. In meteorology, FFT has been utilized for many kinds of research and most of it is related to climate data analysis. Climate data tend to have a long period of observation which reflects scientist’s definition that climate normal as an average over a recent 30-year period.

FFT is a very complicated mathematical equation, and to be honest, I’ve never fully understood how’s the FFT equation works. To see more about FFT and how it works, check this out (A gentle introduction to the FFT). In this article, I want to introduce how to use scipy.fft library in python to decompose seasonality patterns in 37-year precipitation data and get yourself (and myself) to gradually adept with python data analysis library.

import matplotlib
import matplotlib.pyplot as plt
from pylab import *
from math import *
import warnings
warnings.simplefilter(‘ignore’)
import numpy as np
from numpy.fft import fftfreq
from scipy.fftpack import *
from scipy.signal import butter, filtfilt , freqz
import pandas as pd

Import all the libraries needed. Scipy is the main library use in this analysis. It contains many of FFT formulas and filtering methods. Then, read the data.

prec = pd.read_csv(‘prec_chirps.csv’)

Use pandas library to read csv to see the data as data frame like this:

Fig 3. Data frame

The data has 13,879 rows represents daily precipitation (mm) from 1981–2018 and has no null values. The annual seasonality in Fig. 1 can be seen by a naked eye. However, it’s not enough to present the existence of seasonality only by subjective plain sight towards a graph, thus we need more palpable images that could indicate the seasonality by exact number.

dt = 1
n = prec.shape[0]
F = fft(prec[‘prec’])
w = fftfreq(n, dt)
t=np.linspace(1, n, n)
T = n/t[0:6939]
indices = where(w > 0)
w_pos = abs(w[indices])
F_pos = abs(F[indices])

In this step, we start to use FFT to transform the precipitation data which only has time domain to frequency domain. Declare dt = 1 as long as we want to analyze the data on daily basis over a 37-year period of time. The FFT result represents by F.

Bear in mind that the time series precipitation data is a combination of many frequency waves which has each wave parameters and amplify one another. By transforming the data to frequency domain, we could get each set of waves with certain frequencies that build-up the data.

Next, assign data length to n and calculate sample frequency with fftfreq function in which requires n and dt as arguments. It returns float array w contains the frequency bin centers in cycles per unit of the sample spacing (with zero at the start). Don’t forget to set synthetic data variable which represents the time array with an increasing value from 1 until n (data length). Then, set a new variable that portrays the number of periodicity types of the signal whereby the length is half of the original data. After that, select only indices for elements that correspond to positive frequencies. To know the details about why we do this step, check this thread: Why is FFT “mirrored”?

Now, we’re all set to figure the desirable of time-domain transformation to the frequency domain.

fig1 = plt.figure()
ax = fig1.add_axes([0, 0, 2, 0.8])
axF = fig1.add_axes([0, 1.2, 2, 0.8])
axF.plot(w_pos, abs(F_pos))
axF.set_xlabel(‘Frequency’, fontsize = 13)
axF.set_ylabel(‘magnitude’, fontsize = 13)
axF.set_title(‘Periodogram (FFT Result)’, fontsize = 15)
axF.tick_params(labelsize = 13)
ax.plot(T, abs(F_pos))
ax.set_xlabel(‘Period’, fontsize = 13)
ax.set_ylabel(‘magnitude’, fontsize = 13)
ax.set_title(‘Periodogram (FFT Result)’, fontsize = 15)
ax.tick_params(labelsize = 13)
Fig 4. Periodogram (FFT Result)

The graph called periodogram. The difference between the two graphs above is on the x-axis. The upper graph shows periodogram with frequency as an x-axis, on the contrary, the lower graph uses 1/frequency (period) as x-axis. I found that the lower graph is way more intuitive and comprehensible since the period’s unit is the same as the data time unit which is a day. The y-axis is the amplitude of each periodicity/frequency that build-up the data. From the lower graph, it can be seen that from the 37-years length of data, the highest contributions are ranging between lower period sub-signal for exact < 1000 days of a period. Let’s point out that range.

Fig 5. Highlighted Periodogram

By limiting the x-axis range, we get the clear-cut of the periodicity ranges which have a significant amplitude in forming the original data. Without being examined, 365-days periodicity has the highest amplitude as I mentioned earlier. It clearly represents the most common and well-known of Bandung rain seasonality (annual seasonality). The second highest amplitude is on 183-days (~6-month, semi-annual seasonality) and the third is on 122-days (~4-month, quarter-annual seasonality). It turns out that Bandung rain seasonality is not only transpired annually, but it also has semi-annual and quarter-annual patterns, nevertheless, the amplitude is not quite high which means that the patterns are less frequent to exist over the time scope of the data.

Meteorologically, these patterns could happen inflicted by various events such as local disturbances until global variabilities like El Nino-La Nina or Madden-Julian Oscillation (MJO) effects. We should need more researches to be able to answer the cause of these available patterns of Bandung rainfall seasonality. Python programming language with its user-friendly scientific package has been bringing us to advance the data analysis regarding many sectors. This simple use case in meteorology maybe not as complex as other use cases, but it worth knowing and worth sharing. After using FFT and knowing the hidden patterns of the data, there are so many analysis tools which also practical to meteorological science use cases, like filtering (low-pass, band-pass, high-pass filtering) and many more. Hope this article is pertinent to your study field and could help you to get more profound data analysis.

--

--

Tio Faizin

Atmospheric Data Science Enthusiast | Data Analyst in Fintech.