# Understanding Cross-Correlation, Auto-Correlation, Normalization and Time Shift

March 8, 2016 No Comments math To detect the correlation of time series we often use auto-correlation, cross-correlation or normalized cross-correlation. Let’s study these techniques to understand them better.

## Definition:

• Cross-correlation is the comparison of two different time series to detect if there is a correlation between metrics with the same maximum and minimum values. For example: “Are two audio signals in phase?”
• Normalized cross-correlation is also the comparison of two time series, but using a different scoring result. Instead of simple cross-correlation, it can compare metrics with different value ranges. For example: “Is there a correlation between the number of customers in the shop and the number of sales per day?”
• Auto-correlation is the comparison of a time series with itself at a different time. It aims, for example, to detect repeating patterns or seasonality. For example: “Is there weekly seasonality on a server website?” “Does the current week’s data highly correlate with that of the previous week?”
• Normalized auto-correlation is the same as normalized cross-correlation, but for auto-correlation, thus comparing one metric with itself at a different time.
• Time Shift can be applied to all of the above algorithms. The idea is to compare a metric to another one with various “shifts in time”. Applying a time shift to the normalized cross-correlation function will result in a “normalized cross-correlation with a time shift of X”. This can be used to answer questions such as: “When many customers come in my shop, do my sales increase 20 minutes later?”

## Cross-Correlation

To detect a level of correlation between two signals we use cross-correlation. It is calculated simply by multiplying and summing two-time series together.

In the following example, graphs A and B are cross-correlated but graph C is not correlated to either. Using the cross-correlation formula above we can calculate the level of correlation between series.

$$corr(x, y) = \sum_{n=0}^{n-1} x[n]*y[n]$$

\begin{align} corr(a, b) & = 1*2+2*3+-2*-2+4*3+2*2+3*4+1*1+0*-1 \\ & = 41 \end{align}

\begin{align} corr(a, c) & =1*-2+2*0+-2*4+4*0+2*1+3*1+1*0+0*-2 \\ & =-5 \end{align}

Graphs A and B correlate, with a high value of 41.
Graphs A and C don’t correlate, having a low value of -5.

## Normalized Cross-Correlation

There are three problems with cross-correlation:

1.  It is difficult to understand the scoring value.
2. Both metrics must have the same amplitude. If Graph B has the same shape as Graph A but values two times smaller, the correlation will not be detected.
corr(a, a/2) =  19.5
3. Due to the formula, a zero value will not be taken into account, since 0*0=0 and 0*200=0.

To solve these problems we use normalized cross-correlation:

$$norm\_corr(x,y)=\dfrac{\sum_{n=0}^{n-1} x[n]*y[n]}{\sqrt{\sum_{n=0}^{n-1} x[n]^2 * \sum_{n=0}^{n-1} y[n]^2}}$$

Using this formula let’s compute the normalized cross-correlation of AB and AC.

\begin{align} norm\_corr(a,b) &= \dfrac{1*2+2*3+-2*-2+4*3+2*2+3*4+1*1+0*-1}{\sqrt{(1+4+4+16+4+9+1+0)*(4+9+4+9+4+16+1+1)}} \\ & = \dfrac{41}{\sqrt{(39)*(48)}} \\ & = 0.947 \end{align}

\begin{align} norm\_corr(a,c) & =\dfrac{1*-2+2*0+-2*4+4*0+2*1+3*1+1*0+0*-2}{\sqrt{(1+4+4+16+4+9+1+0)*(4+0+16+0+1+1+0+4)}} \\ & =\dfrac{-5}{\sqrt{(39)*(26)}} \\ & =-0.157 \end{align}

Graphs A and B correlate, with a high value of 0.947.
Graphs A and C don’t correlate, showing a low value of -0.157.

• Normalized cross-correlation scoring is easy to understand:
– The higher the value, the higher the correlation is.
– The maximum value is 1 when two signals are exactly the same:
norm_corr(a,a)=1
– The minimum value is -1 when two signals are exactly opposite:
norm_corr(a, -a) = -1
• Normalized cross-correlation can detect the correlation of two signals with different amplitudes: norma_corr(a, a/2) = 1.
Notice we have perfect correlation between signal A and the same signal with half the amplitude!

## Auto-Correlation

Auto-correlation is very useful in many applications; a common one is detecting repeatable patterns due to seasonality.

The following graph clearly shows repeating patterns every 8 data points. Indeed, looking at the R code, it’s a repeatable sequence of the numbers 1 through 8 with some random noise in the mix. Let’s compute the auto-correlation between the signal and itself at a time shift of 4 and time shift of 8. The following graphs clearly show a high auto-correlation at time shift 8, but not at time shift 4.  To compute, we can use the same formula as cross-correlation (see above).

For a time shift of 8 the auto-correlation is higher than a time shift of 4. We have detected seasonality with a period of 8.

## Normalized Auto-Correlation

We discussed earlier the advantages of normalized cross-correlation. In the same way, we can compute the normalized auto-correlation with time shifts of 4 and 8:

Normalized cross-correlation makes it very obvious that the signal repeats in a similar manner every 8 data points.

## Correlation with Time Shift

All correlation techniques can be modified by applying a time shift. For example, it is very common to perform a normalized cross-correlation with time shift to detect if a signal “lags” or “leads” another.

To process a time shift, we correlate the original signal with another one moved by x elements to the right or left. Just as we did for auto-correlation.

To detect if two metrics are correlated with a time shift we need to compute all the possible time shifts. Fortunately, the R language can compute all the correlations with time shift very quickly.

##### Normalized Cross-Correlation with Time Shift  -4 -3 -2 -1 0 1 2 3 4 0.862 0.021 -0.547 -0.423 0 0.867 0.127 -0.466 -0.393

As we expected from the graph above, the metrics highly correlate with a time shift of 1.

##### Normalized Auto-Correlation with Time Shift  0 1 2 3 4 5 6 7 8 9 10 1 0.335 -0.122 -0.304 -0.369 -0.374 -0.226 0.187 0.789 0.306 -0.12

The output above repeats every 8 datapoints. As expected, the auto-correlation detects a high correlation when the series is compared to itself at a time shift of 8.

## Conclusion

Here at anomaly.io, we commonly use both cross-correlation and auto-correlation, which are building blocks to detecting unusual patterns in your data. As auto-correlation can detect the seasonality of a metric, we can apply a range of anomaly detection algorithms such as seasonal decomposition of time series or seasonally adjusting a time series. When a cross-correlation is found, we can detect anomalies when the correlation is broken between the series.

Be sure to also read “Detecting Correlation Among Time Series“.

Monitor & detect anomalies with Anomaly.io

help with term papers