Our weather follows typical patterns over the course of a year: in most places, it is cold in the winter and hot in the summer. Based on historical records we can detect unusual weather.
Requirements
As is often required for statistical tasks, th R programming language need to be installed, as well as two packages:
- weatherData package to download weather data
- Twitter anomaly detection to spot anomalies in metrics
Open your R console to install both packages.
1 2 3 | install.packages("devtools") devtools::install_github("twitter/AnomalyDetection") devtools::install_github("Ram-N/weatherData") |
Download Weather Data
Let’s download the weather dta from 2001 to 2014 recorded at LAX airport and BUR airport. These two airports are located 20 km (12 miles) from each other in Los Angeles, California.
Here we download, clean and save as a CSV file both airport weather records.
Sometimes values are missing. So we “cheat” using the lowest and highest temperature of the day, or the average of the previous day. This is a debatable solution but good enough for this test.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | library(weatherData) download = function(airport, from, to){ start = paste(from, "-01-01", sep = "") end = paste(to, "-12-31", sep = "") all = getWeatherForYear(airport, from) for(year in (from+1):to){ yearData = getWeatherForYear(airport, year) all = rbind(all, yearData) } return(all) } fix = function(entry){ if(!is.na(entry$Mean_TemperatureC)){ return(entry$Mean_TemperatureC) }else if(!is.na(entry$Max_TemperatureC) && !is.na(entry$Min_TemperatureC)){ average = (entry$Max_TemperatureC + entry$Min_TemperatureC)/2 return(average) }else if (!is.na(entry$Max_TemperatureC)){ return (entry$Max_TemperatureC) } return (entry$Min_TemperatureC) } clean = function(daily){ dailyFix = daily[1,] for(i in 2:nrow(daily)){ current = daily[i,] current$Mean_TemperatureC = fix(current) if(is.na(current$Mean_TemperatureC)){ previous = daily[i-1,] current$Mean_TemperatureC = fix(previous) } dailyFix = rbind(dailyFix, current) } return(dailyFix) } #download laxDurty = download("LAX", 2001, 2014) burDurty = download("BUR", 2001, 2014) #clean bur = clean(BUR) lax = clean(LAX) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | lax$DateString = as.character(lax$Date) bux$DateString = as.character(bur$Date) merge = merge(x = lax, y = bur, by = "DateString") cleanAndSmooth = function(meanTemperature, movingAverage) { fix = meanTemperature[(7):length(meanTemperature)] meanFix = mean(fix[!is.na(fix)]) fix[is.na(fix)] = meanFix ma = ma(fix, order = movingAverage, centre = TRUE) ma[is.na(ma)] = meanFix return(ma) } maLax = cleanAndSmooth(merge$Mean_TemperatureC.x, 7) maBur = cleanAndSmooth(merge$Mean_TemperatureC.y, 7) |
Merge Weather Data
Merging the histories means we only keep the records available at both airports.
Then we once again clean the data:
- Removing “NA” values
- Applying a moving average for smoothing
Unusual Weather at Los Angeles Airport
1 2 | res = AnomalyDetectionVec(maLax, max_anoms=0.01, period=365, plot=TRUE, title = "Anomaly Los Angeles") res$plot |
Each slice of the graph shows a year.
As expected, the weather follows a yearly trend. It is usually cold in winter and hot in summer. But two anomalies didn’t follow this pattern:
- In early winter 2008 the weather was 20°C instead of 15°C on average
- In Q2 2014 the temperature reached 25°C instead of the typical 20°C
Normal Weather at BUR Airport
1 2 3 | res = AnomalyDetectionVec(maBur, max_anoms=0.1, period=365, plot=TRUE, title = "Anomaly Bob Hope Airport (~41km from LA airport)") res$plot plot(as.ts(maBur)) |
This shows the weather records at Burbank Bob Hope Airport.
No anomalies were detected at this airport. The Twitter package also didn’t produce a graph showing any. So we draw it using the “plot” function.
Correlation Anomaly
1 2 3 | diff = maLax-maBur res = AnomalyDetectionVec(diff, max_anoms=0.01, period=365, direction='both', plot=TRUE, only_last = FALSE, title = "Anomaly difference") res$plot |
This is the temperature difference between the Los Angeles and Burbank airports.
Beginning in 2013, a difference of 5°C was recorded between both the airports. A 5°C difference is typical in the summer but not in the winter!
Monitor & detect anomalies with Anomaly.io
SIGN UP