Anomaly Detection with Twitter in R

April 21, 2015 2 Comments detection

anomaly detection at Twitter

Twitter has made an open source anomaly detection package in R. Its goal is to detect anomalies in seasonal time series, as well as underlying trends.
Find the Anomaly Source Code on GitHub

Does it Really Detect Anomalies?

Yes! It actually works very well, as long as you use it for what it was created for. It was designed to detect global and local anomalies.

  • Global anomalies: These are the kind we are the most familiar with: anomalies that go out of the usual range. While not always the best way, using the 95 percentile technique can detect this kind of anomaly.

local anomaly detected

  • Local anomalies: Very often we can see an underlying trend in our data. It usually looks like a “wave”: low activity in the morning, high during the day, low again at night. Local anomalies occur within this context. For example: high activity at night indicates an anomaly.

global anomaly detected

What anomalies can be detected?

First, the software aims to detect global and local anomalies (see above). It is intended to understand “underlying trends” such as organic growth in the metrics. Twitter calls this algorithm a Seasonal Hybrid ESD (S-H-ESD).

I was very impressed by the Twitter anomaly detection system. It handled many different anomaly cases. Of course it didn’t detect everything: only what it was built for.

[Anomaly detected] Growth too early in seasonal metrics

1-bumpToEarly-anomaly

[Anomaly detected] Some unusual noise

2-moreNoise-anomaly

[Anomaly detected] More noise than usual

3-moreNoise-anomaly

[Anomaly detected] Breakdown

4-plateau-anomaly

[Anomaly detected] Sudden growth

5-growSuddenly-anomaly

[Anomaly detected] Sudden growth

6-floor-anomaly

[Anomaly detected] Pick

7-speark-anomaly

[Anomaly detected] Unusually high activity

8-bumpInDoublePick-anomaly

[Anomaly not detected] Linear growth

9-justGrow-no-anomaly

[Anomaly not detected] Linear seasonal growth

10-linearGrow-no-anomaly

What Can’t be Detected?

Twitter Anomaly detection is impressive. but is built to detect certain kinds of anomalies, not all of them!

[Anomaly not detected] Flat signal

2-flat-not-detected

[Anomaly not detected] No noise

1-removeNoise-not-detected

[Anomaly not detected] Exponential growth

3-exponentialGrow-not-detected

[Anomaly not detected] Negative seasonal anomaly

4-linearGrowWithError-not-detected

[Anomaly not detected] Negative seasonal anomaly

5-justGrowWithError-not-detected

Conclusion

Twitter made a big breakthrough in anomaly detection. Its model can detect a wide variety of anomalies.

There are only two drawbacks:
  • To my eyes, it only failed to detect one kind of anomaly: “negative seasonal anomalies” (last graph above)
  • R is awesome, but not suitable for anomaly detection in real time

Overall, however, it is incredible software. Congratulations Twitter, outstanding job!

Monitor & detect anomalies with Anomaly.io

SIGN UP