Extracting Seasonality and Trend from Data: Decomposition Using R

December 1, 2015 33 Comments math

time-series-decomposition-seasonal-trend

Time series decomposition works by splitting a time series into three components: seasonality, trends and random fluctiation. To show how this works, we will studyAi??the decompose( ) and STL( ) functions in the R language.

Understanding Decomposition

Decompose One Time Series into Multiple Series

Time series decomposition is a mathematicalAi??procedure which transforms a time series into multiple different time series.Ai??The original time series is often split into 3 component series:

  • Seasonal: Patterns that repeat with a fixed period of time. For example, a website might receive more visits during weekends; this would produce data with a seasonality of 7 days.
  • Trend: The underlying trend of the metrics. A website increasing in popularity should show a general trend that goes up.
  • Random: Also call “noise”, “irregular” or “remainder,” this is theAi??residuals of the original time seriesAi??after the seasonal and trend series are removed.

Additive or Multiplicative Decomposition?

To achieve successful decomposition, it is important to choose between the additive and multiplicative models, which requires analyzing the series. For example, does the magnitude of the seasonalityAi??increaseAi??when the time seriesAi??increases?

additive-modelAustralian beer production – The seasonal variation looks constant; it doesn’t change when the time series value increases.Ai??We should use theAi??additive model.

multiplicative-modelAirline Passenger Numbers – As the time series increases in magnitude, the seasonal variation increases as well. Here we should use the multiplicative model.

Additive:
Time series = Seasonal + Trend +Ai??Random

Multiplicative:
Time series = Trend * Seasonal *Random

The decomposition formulaAi??varies a little based on the model.

Step-by-Step:Ai??Time Series Decomposition

We’ll study the decompose( ) function in R. As a decomposition function, it takes a time series as a parameter and decomposes it into seasonal, trend and random time series. We’ll reproduce step-by-step the decompose( ) function in R to understand how it works. Since there are variations between the two models, we’ll use two examples: Australian beer production (additive) and airline passenger numbers (multiplicative).

Step 1: Import the Data

Additive

As mentioned previously, a good example of additive time series is beer production. As the metric values increase, the seasonality stays relatively constant.

Multiplicative

Monthly airline passenger figures are a good example of a multiplicative time series. The more passengers there are, the more seasonality is observed.

additive-model

multiplicative-model

Step 2: Detect the Trend

To detect the underlying trend, we smoothe the time series using the “centred moving average“. To perform the decomposition, it is vital to use a moving window of the exact size of the seasonality. Therefore, to decompose a time series we need to know the seasonality period: weekly, monthly, etc… If you don’t know this figure, you canAi??detect the seasonality using a Fourier transform.

Additive

Australian beer productionAi??clearly follows annual seasonality. As it is recorded quarterly, there are 4 data points recorded per year, and we use a moving average window of 4.

Multiplicative

The process here is the same as for the additive model. Airline passenger number seasonality also looks annual. However, it is recorded monthly, so we choose a moving average window of 12.

additive-moving-average
additive-trend

multiplicative-moving-average
multiplicative-trend

Step 3: Detrend the Time Series

Removing the previously calculated trend from the time seriesAi??willAi??result into a new time series that clearly exposes seasonality.

Additive

Multiplicative

additive-detrend

multiplicative-detrend

Step 4: Average the Seasonality

From the detrended time series, it’s easy to compute the average seasonality. We add the seasonality together and divideAi??by the seasonality period.Ai??Technically speaking, to average together the time series we feed the time series into a matrix. Then, we transform the matrix so each column contains elements of the same period (same day, same month, same quarter, etc…). Finally, we compute the mean of each column. HereAi??is how to do it in R:

Additive

Quarterly seasonality: we use a matrix of 4Ai??rows. TheAi??average seasonality is repeated 16 timesAi??to create the graphic to be compared later (see below)

Multiplicative

Monthly seasonality: we use a matrix of 12 rows.
TheAi??average seasonality is repeated 12 timesAi??to create the graphic we will compare later (see below)

Gasex online calculator
additive-seasonality

multiplicative-sesonal

Step 5: Examining Remaining Random Noise

The previous steps have already extracted most of the data from the original time series, leaving behind only “random” noise.

Additive

The additive formula is “Time series =Ai??Seasonal + Trend + Random”, which means “Random = Time series – Seasonal -Ai??Trend”

Multiplicative

The multiplicative formula is “Time series =Ai??Seasonal *Ai??Trend * Random”, which means “Random = Time series / (Trend *Ai??Seasonal)”

additive-random

multiplicative-random

Step 6: Reconstruct the Original Signal

The decomposed time series can logically beAi??recomposed using the model formula to reproduce the original signal. Some data points will be missing at the beginning and the end of the reconstructed time series, due to the moving average windows which must consume some data before producing average data points.

Additive

The additive formula is “Time series =Ai??Seasonal + Trend + Random”, which means “Random = Time series – Seasonal -Ai??Trend”

Multiplicative

The multiplicative formula is “Time series =Ai??Seasonal *Ai??Trend * Random”, which means “Random = Time series / (Trend *Ai??Seasonal)”

additive-recomposed

multiplicative-recomposed

DECOMPOSE( ) and STL():Ai??Time Series Decomposition in R

To make life easier,Ai??some R packages provides decomposition with a single line of code. As expected, our step-by-step decomposition provides the same results as the DECOMPOSE( ) and STL( ) functions (see the graphs).

Additive

The only requirement: seasonality is quarterly (frequency = 4)

Using the DECOMPOSE( ) function:

Multiplicative

The only requirement: seasonality is monthly (frequency = 12)

additive-decompose

multiplicative-decompose

Now using the STL( ) function:

additive-stl

Conclusion

Decomposition is often used toAi??remove the seasonal effect from a time series. It provides a cleaner way toAi??understand trends. For instance, lower ice cream sales during winter don’t necessarily mean a company is performing poorly. To know whether or not this is the case, we need to remove the seasonality from the time series. Here, at Anomaly.io we detect anomalies, and we use seasonally adjusted time series to do so. We also use the random (also call remainder) time series from the decomposed time series to detect anomalies and outliers.

Monitor & detect anomalies with Anomaly.io

SIGN UP
  • PaoloRocca

    Nice post.. very well explained!! Thanks

    • http://www.tombush.co.uk/ Tom Bush

      Thanks for this, really helpful. Just curious though — is there a reason you didn’t use stl() on the AirPassengers dataset?

  • Jenice Tom

    Great substance and content, but please proofread.

    • http://www.tombush.co.uk/ Tom Bush

      Thanks for this, really helpful. Just curious though — is there a reason you didn’t use stl() on the AirPassengers dataset?

  • Aleksandras Urbonas

    Can I say ‘wow, this looks professional!’?
    Well thought of, and the layout is amazing.
    I wonder what your topic of interest is now that you have mastered time series :)

    • http://www.tombush.co.uk/ Tom Bush

      Thanks for this, really helpful. Just curious though — is there a reason you didn’t use stl() on the AirPassengers dataset?

  • Jithin Sam Varghese

    Thank you! This was very helpful.

    • http://www.tombush.co.uk/ Tom Bush

      Thanks for this, really helpful. Just curious though — is there a reason you didn’t use stl() on the AirPassengers dataset?

  • http://www.tombush.co.uk/ Tom Bush

    Thanks for this, really helpful. Just curious though — is there a reason you didn’t use stl() on the AirPassengers dataset?

  • martin magakian

    Thanks for this, really helpful. Just curious though — is there a reason you didn’t use stl() on the AirPassengers dataset?

  • http://www.tombush.co.uk/ Tom Bush

    Thanks for this, really helpful. Just curious though — is there a reason you didn’t use stl() on the AirPassengers dataset?

  • Milan Jain

    Thanks for this, really helpful. Just curious though — is there a reason you didn’t use stl() on the AirPassengers dataset?

  • Elise Gelder

    Thanks for this, really helpful. Just curious though — is there a reason you didn’t use stl() on the AirPassengers dataset?

  • Rajesh Pazhyannur

    Thanks for this, really helpful. Just curious though — is there a reason you didn’t use stl() on the AirPassengers dataset?

  • Sacha khoury

    Thanks for this, really helpful. Just curious though — is there a reason you didn’t use stl() on the AirPassengers dataset?

  • Iman Fatehi

    Thanks for this, really helpful. Just curious though — is there a reason you didn’t use stl() on the AirPassengers dataset?

  • http://arhiuch.ru Владимир Заляжных

    Thanks for this, really helpful. Just curious though — is there a reason you didn’t use stl() on the AirPassengers dataset?

  • nkabouche

    Thanks for this, really helpful. Just curious though — is there a reason you didn’t use stl() on the AirPassengers dataset?

  • http://www.tombush.co.uk/ Tom Bush

    Thanks for this, really helpful. Just curious though — is there a reason you didn’t use stl() on the AirPassengers dataset?

  • http://www.tombush.co.uk/ Tom Bush

    Thanks for this, really helpful. Just curious though — is there a reason you didn’t use stl() on the AirPassengers dataset?

  • http://www.tombush.co.uk/ Tom Bush

    Thanks for this, really helpful. Just curious though — is there a reason you didn’t use stl() on the AirPassengers dataset?

  • http://www.tombush.co.uk/ Tom Bush

    Thanks for this, really helpful. Just curious though — is there a reason you didn’t use stl() on the AirPassengers dataset?

  • http://www.tombush.co.uk/ Tom Bush

    Thanks for this, really helpful. Just curious though — is there a reason you didn’t use stl() on the AirPassengers dataset?

  • http://www.tombush.co.uk/ Tom Bush

    Thanks for this, really helpful. Just curious though — is there a reason you didn’t use stl() on the AirPassengers dataset?

  • http://www.tombush.co.uk/ Tom Bush

    Thanks for this, really helpful. Just curious though — is there a reason you didn’t use stl() on the AirPassengers dataset?

  • http://www.tombush.co.uk/ Tom Bush

    Thanks for this, really helpful. Just curious though — is there a reason you didn’t use stl() on the AirPassengers dataset?

  • http://www.tombush.co.uk/ Tom Bush

    Thanks for this, really helpful. Just curious though — is there a reason you didn’t use stl() on the AirPassengers dataset?

  • http://www.tombush.co.uk/ Tom Bush

    Thanks for this, really helpful. Just curious though — is there a reason you didn’t use stl() on the AirPassengers dataset?

  • Alexy Flemming1

    Thanks for this, really helpful. Just curious though — is there a reason you didn’t use stl() on the AirPassengers dataset?

  • Atinesh Si

    Thanks for this, really helpful. Just curious though — is there a reason you didn’t use stl() on the AirPassengers dataset?

  • Zach Estela

    Thanks for this, really helpful. Just curious though — is there a reason you didn’t use stl() on the AirPassengers dataset?

  • ajare oloruntoba

    Thanks for this, really helpful. Just curious though — is there a reason you didn’t use stl() on the AirPassengers dataset?

  • ajare oloruntoba

    Thanks for this, really helpful. Just curious though — is there a reason you didn’t use stl() on the AirPassengers dataset?