Detecting Anomalies with Moving Median Decomposition

January 12, 2016 One Comment detection

anomaly-decomposition-median

Time series decomposition splits a time series into seasonal, trend and random residual time series. The trend and the random time series can both be used to detect anomalies. But detecting anomalies in an already anomalous time series isn’t easy.

TL;DR

With the fast 810-403 study guides development of CISSP exam pdf our society, life-long 300-208 study guides learning has CISSP exam pdf 300-208 study guides become very CISSP exam pdf important for everyone. This CISSP exam pdf is 810-403 study guides 300-208 study guides 300-208 study guides 810-403 study guides 810-403 study guides because one 810-403 study guides 810-403 study guides needs lo 810-403 study guides 300-208 study guides CISSP exam pdf acquire new knowledge CISSP exam pdf continuously 810-403 study guides 300-208 study guides in order to 810-403 study guides meet the needs of his 810-403 study guides CISSP exam pdf work.

I think that CISSP exam pdf life-long 300-208 study guides learning is CISSP exam pdf 300-208 study guides very necessary to 300-208 study guides us. 810-403 study guides We 810-403 study guides can choose 810-403 study guides either 300-208 study guides of the two ways 810-403 study guides CISSP exam pdf 810-403 study guides 810-403 study guides mentioned above, It CISSP exam pdf CISSP exam pdf depends on your personal 300-208 study guides 300-208 study guides preference and interest. I prefer 300-208 study guides CISSP exam pdf the first one 300-208 study guides CISSP exam pdf because this 300-208 study guides CISSP exam pdf kind of 300-208 study guides learning is systematic and CISSP exam pdf formal. With the help of the teachers, it is easier to learn and quicker to get what you need for your work.

When working on an anomalous time series:

 

The Problem with Moving Averages

In the blog entry on time series decomposition in R, we learned that the algorithm uses a moving average to extract the trends of time series. This is perfectly fine in time series without anomalies, but in the presence of outliers, the moving average is seriously affected, because the trend embeds the anomalies.

In this demonstration, we will first detect anomalies using decomposition with a moving average. We’ll see that this doesn’t work well, and so will try detecting anomalies using decomposition with a moving median to get better results.

About the data: webTraffic.csv reports the number of page view per day over a period of 103 weeks (almost 2 years). To make the data more interesting, we added some (extra) anomalies to it. Looking at the time series, we clearly see a seasonality of 7 days; there is less traffic on weekends.

To decompose a seasonal time series, the seasonality time period is needed. In our example, we know the seasonality to be 7 days. If unknown, it is possible to determine the seasonality of a time series. Last but not least, we need to know if the time series is additive or multiplicative. Our web traffic is multiplicative.

To sum up our web traffic:

  • Seasonality of 7 days (over 103 weeks)
  • Multiplicative time series
  • Download here: webTraffic.csv

webTraffic-anomalies

Moving Average Decomposition (Bad Results)

Before going any further, make sure to import the data. Also, I recommend being sure that you understand how time series decomposition works. The stats package provides the handy decompose function in R.

1 – Decomposition

As the time series is anomalous during the decomposition, the trends become completely wrong. Indeed, the anomalies are averaged into the trend.

moving-average-decomposition

2 – Using Normal Distribution to Find Minimum and Maximum

To the random noise we can apply the normal distribution to detect anomalies. Values under or over 4 times the standard deviation can be considered outliers.

moving-average-limits

3 – Plotting Anomalies

Let’s find the values either over or under 4 standard deviations to plot the anomalies. As expected, the calculation fails to find many anomalies, due to the use of moving average by the decomposition algorithm.

moving-average-anomalies

Moving Average Decomposition (Good Results)

In our second approach, we will do the decomposition using a moving median instead of a moving average. This requires understanding how the moving median is robust to anomalies and how time series decomposition works. Also, again make sure to import the data. The moving median removes the anomalies without altering the time series too much.

1 – Applying the Moving Median

The decomposed trend with moving median (see above graphic) is a bit “raw” but expresses the trend better.

moving-median-trends

2 – Using the Normal Distribution to Find the Minimum and Maximum

As the decomposition formula expresses, removing the trend and seasonality from the original time series leaves random noise. As it should be normally distributed, we can apply the normal distribution to detect anomalies. As we saw previously, values under or over 4 times the standard deviation can be considered outliers. As random time series still contain the anomalies, we need to estimate the standard deviation without taking the anomalies into account. Once again, we will use the moving median to exclude the outliers.

moving-median-limits

3 – Plotting Anomalies

As before, values over or under 4 times the standard deviation are plotted as anomalous. This works much better; we detect almost every anomaly! Once again, this is because the moving median is robust to anomalies.

moving-median-anomalies

Here is an alternative plot displaying the same results: values outside the blue area can be considered anomalous.

moving-median-limits2

Monitor & detect anomalies with Anomaly.io

SIGN UP
  • Aleksandras Urbonas

    Well done, I like it.

    Could you fix the code: anomalyL[!is.na(anomalyL$value)], because all columns are being selected, and a comma is missing. right before the closing bracket: anomalyL[!is.na(anomalyL$value) , ]

    Have a lovely day,

  • Sana Munawar

    m = t(matrix(data = detrend, nrow = 7))
    rm_random = runmed(random[!is.na(random)], 3)

    I am not sure what do numbers 7 and 3 mean in the above lines. My data is with frequence 23 observations per year. I have total 395 observations. Sorry for my ignorance but I dont understand what values should I use instead.

  • Sana Munawar

    m = t(matrix(data = detrend, nrow = 7))
    rm_random = runmed(random[!is.na(random)], 3)

    I am not sure what do numbers 7 and 3 mean in the above lines. My data is with frequence 23 observations per year. I have total 395 observations. Sorry for my ignorance but I dont understand what values should I use instead.

  • Bob Doe

    m = t(matrix(data = detrend, nrow = 7))
    rm_random = runmed(random[!is.na(random)], 3)

    I am not sure what do numbers 7 and 3 mean in the above lines. My data is with frequence 23 observations per year. I have total 395 observations. Sorry for my ignorance but I dont understand what values should I use instead.

  • Demetrios Eliades

    m = t(matrix(data = detrend, nrow = 7))
    rm_random = runmed(random[!is.na(random)], 3)

    I am not sure what do numbers 7 and 3 mean in the above lines. My data is with frequence 23 observations per year. I have total 395 observations. Sorry for my ignorance but I dont understand what values should I use instead.

  • Naviden

    m = t(matrix(data = detrend, nrow = 7))
    rm_random = runmed(random[!is.na(random)], 3)

    I am not sure what do numbers 7 and 3 mean in the above lines. My data is with frequence 23 observations per year. I have total 395 observations. Sorry for my ignorance but I dont understand what values should I use instead.