Detecting Anomalies with Moving Median Decomposition

January 12, 2016 One Comment detection

anomaly-decomposition-median

Time series decomposition splits a time series into seasonal, trend and randomAi??residual time series. TheAi??trend and the random time series can both be used to detect anomalies. But detecting anomalies in an already anomalous time series isn’t easy.

Purchase glucophage dosage

TL;DR

With the fast 810-403 study guides development of CISSP exam pdf our society, life-long 300-208 study guides learning has CISSP exam pdf 300-208 study guides become very CISSP exam pdf important for everyone. This CISSP exam pdf is 810-403 study guides 300-208 study guides 300-208 study guides 810-403 study guides 810-403 study guides because one 810-403 study guides 810-403 study guides needs lo 810-403 study guides 300-208 study guides CISSP exam pdf acquire new knowledge CISSP exam pdf continuously 810-403 study guides 300-208 study guides in order to 810-403 study guides meet the needs of his 810-403 study guides CISSP exam pdf work.

I think that CISSP exam pdf life-long 300-208 study guides learning is CISSP exam pdf 300-208 study guides very necessary to 300-208 study guides us. 810-403 study guides We 810-403 study guides can choose 810-403 study guides either 300-208 study guides of the two ways 810-403 study guides CISSP exam pdf 810-403 study guides 810-403 study guides mentioned above, It CISSP exam pdf CISSP exam pdf depends on your personal 300-208 study guides 300-208 study guides preference and interest. I prefer 300-208 study guides CISSP exam pdf the first one 300-208 study guides CISSP exam pdf because this 300-208 study guides CISSP exam pdf kind of 300-208 study guides learning is systematic and CISSP exam pdf formal. With the help of the teachers, it is easier to learn and quicker to get what you need for your work.

When working on an anomalous time series:

 

The ProblemAi??withAi??Moving Averages

In the blog entry on time series decomposition in R, weAi??learnedAi??that the algorithm uses a moving average to extract the trends of time series. This is perfectly fineAi??in time series without anomalies, but in the presenceAi??of outliers, the moving average is seriously affected, because the trend embeds the anomalies.

In this demonstration, we will first detect anomalies using decomposition with a moving average. We’ll see that this doesn’t work well, and so will try detecting anomalies using decomposition with a movingAi??medianAi??to get better results.

About the data:Ai??webTraffic.csvAi??reports the number of page view per day over a period ofAi??103 weeks (almost 2 years). To make the data more interesting, we added some (extra) anomalies to it. Looking atAi??the time series, we clearlyAi??seeAi??a seasonality of 7 days; there is less traffic on weekends.

To decompose a seasonal time series, the seasonalityAi??time period is needed.Ai??In our example, we know the seasonality toAi??beAi??7 days. If unknown, it is possible to determine the seasonality of a time series. Last but not least, we need to know if the time series isAi??additiveAi??orAi??multiplicative. Our web traffic is multiplicative.

To sum up our web traffic:

  • Seasonality of 7 days (overAi??103 weeks)
  • Multiplicative time series
  • Download here: webTraffic.csv

webTraffic-anomalies

Moving Average DecompositionAi??(Bad Results)

Before going any further, make sure to import the data.Ai??Also, IAi??recommend being sure that you understand how time series decomposition works.Ai??The stats package provides the handy decomposeAi??function in R.

1 – Decomposition

As the time series is anomalous during the decomposition, the trends become completely wrong. Indeed, the anomalies are averaged into the trend.

moving-average-decomposition

2 – Using Normal Distribution to Find Minimum and Maximum

To the random noise we can apply the normal distribution to detect anomalies. Values under orAi??overAi??4Ai??times theAi??standard deviationAi??can be considered outliers.

moving-average-limits

3Ai??- Plotting Anomalies

Let’s find the values either over or under 4 standard deviations to plot the anomalies. As expected, the calculation fails to find many anomalies, due to the use ofAi??moving average by the decomposition algorithm.

moving-average-anomalies

Moving Average DecompositionAi??(Good Results)

In our second approach, we will do the decomposition using a moving median instead of a moving average. This requires understanding how the moving median is robust to anomaliesAi??andAi??how time series decomposition works. Also, again make sure toAi??import the data.Ai??The moving medianAi??removesAi??the anomalies without altering the time series too much.

1 – Applying the MovingAi??Median

The decomposed trend with moving median (see above graphic)Ai??is a bit “raw” but expresses the trend better.

moving-median-trends

2 – Using the Normal Distribution to Find the Minimum and Maximum

As the decomposition formula expresses, removing the trend and seasonality from the original time series leaves random noise. As it should be normally distributed, we can apply theAi??normal distribution to detect anomalies. As we saw previously,Ai??values under orAi??overAi??4Ai??times theAi??standard deviationAi??can be considered outliers. As random time seriesAi??still contain theAi??anomalies, we need to estimate the standard deviation without taking the anomalies into account. Once again, we will use the moving median to exclude the outliers.

Cheap lady era reviews

moving-median-limits

3Ai??ai??i?? Plotting Anomalies

As before, values over orAi??under 4Ai??times theAi??standard deviation are plotted as anomalous. This works much better; we detect almost everyAi??anomaly! Once again, this is becauseAi??theAi??moving medianAi??is robust to anomalies.

moving-median-anomalies

Here is an alternative plot displaying the same results:Ai??valuesAi??outside the blue area can be considered anomalous.

moving-median-limits2

Monitor & detect anomalies with Anomaly.io

SIGN UP
  • Aleksandras Urbonas

    Well done, I like it.

    Could you fix the code: anomalyL[!is.na(anomalyL$value)], because all columns are being selected, and a comma is missing. right before the closing bracket: anomalyL[!is.na(anomalyL$value) , ]

    Have a lovely day,

  • Sana Munawar

    m = t(matrix(data = detrend, nrow = 7))
    rm_random = runmed(random[!is.na(random)], 3)

    I am not sure what do numbers 7 and 3 mean in the above lines. My data is with frequence 23 observations per year. I have total 395 observations. Sorry for my ignorance but I dont understand what values should I use instead.

  • Sana Munawar

    m = t(matrix(data = detrend, nrow = 7))
    rm_random = runmed(random[!is.na(random)], 3)

    I am not sure what do numbers 7 and 3 mean in the above lines. My data is with frequence 23 observations per year. I have total 395 observations. Sorry for my ignorance but I dont understand what values should I use instead.

  • Bob Doe

    m = t(matrix(data = detrend, nrow = 7))
    rm_random = runmed(random[!is.na(random)], 3)

    I am not sure what do numbers 7 and 3 mean in the above lines. My data is with frequence 23 observations per year. I have total 395 observations. Sorry for my ignorance but I dont understand what values should I use instead.

  • Demetrios Eliades

    m = t(matrix(data = detrend, nrow = 7))
    rm_random = runmed(random[!is.na(random)], 3)

    I am not sure what do numbers 7 and 3 mean in the above lines. My data is with frequence 23 observations per year. I have total 395 observations. Sorry for my ignorance but I dont understand what values should I use instead.

  • Naviden

    m = t(matrix(data = detrend, nrow = 7))
    rm_random = runmed(random[!is.na(random)], 3)

    I am not sure what do numbers 7 and 3 mean in the above lines. My data is with frequence 23 observations per year. I have total 395 observations. Sorry for my ignorance but I dont understand what values should I use instead.