Wikipedia daily pageviews are available online, so we can use this data to spot anomalies. Let’s see if there are any strange pageview patterns for Marie Curie.
Requirements
For this project, we will use the R programming language with two packages:
- Wikipediatrend R package
Very useful package for downloading Wikipedia page view history - Twitter anomaly detection
To spot anomalies within metrics
Open your R console and install both packages.
1 2 3 | install.packages("devtools") devtools::install_github("twitter/AnomalyDetection") devtools::install_github("petermeissner/wikipediatrend") |
Detect Wikipedia Anomaly
Twitter Anomaly Detection is very impressive! It can detect many kinds of anomalies
Wikipediatrend will be helpful for downloading Wikipedia page view history so we can run our detection on it.
1 2 3 4 5 6 7 8 9 10 11 12 13 | #import library library(AnomalyDetection) library(wikipediatrend) #download Marie Currie page view history marie = wp_trend("Marie_Curie", from="2008-01-01") #format data marieReady = data.frame(time=as.POSIXct(marie$date), count=marie$count) #search / display anomalies res = AnomalyDetectionTs(marieReady, max_anoms=0.01, direction='both', plot=TRUE) res$plot |
This page usually gets around 4000 pageviews per day. But in November 2011, it received 1.1 million pageviews. Anomaly detected!
Why the Anomaly?
Reading the Marie Curie Wikipedia page we find that she was born in November. Google created a Google doodle for Marie Curie for her birthday. This led to an unusual traffic increase on Marie Curie’s page.
Monitor & detect anomalies with Anomaly.io
SIGN UP