Understanding Cross-Correlation, Auto-Correlation, Normalization and Time Shift

March 8, 2016 No Comments math

normalized-cross-correlation-auto-shift

To detect the correlation of time series we often use auto-correlation, cross-correlation or normalized cross-correlation. Let’s study these techniques to understand them better.

Definition:

  • Cross-correlation is the comparison of two different time series to detect if there is a correlation between metrics with the same maximum and minimum values. For example: “Are two audio signals in phase?”
  • Normalized cross-correlation is also the comparison of two time series, but using a different scoring result. Instead of simple cross-correlation, it can compare metrics with different value ranges. For example: “Is there a correlation between the number of customers in the shop and the number of sales per day?”
  • Auto-correlation is the comparison of a time series with itself at a different time. It aims, for example, to detect repeating patterns or seasonality. For example: “Is there weekly seasonality on a server website?” “Does the current week’s data highly correlate with that of the previous week?”
  • Normalized auto-correlation is the same as normalized cross-correlation, but for auto-correlation, thus comparing one metric with itself at a different time.
  • Time Shift can be applied to all of the above algorithms. The idea is to compare a metric to another one with various “shifts in time”. Applying a time shift to the normalized cross-correlation function will result in a “normalized cross-correlation with a time shift of X”. This can be used to answer questions such as: “When many customers come in my shop, do my sales increase 20 minutes later?”

Cross-Correlation

To detect a level of correlation between two signals we use cross-correlation. It is calculated simply by multiplying and summing two-time series together.

In the following example, graphs A and B are cross-correlated but graph C is not correlated to either.

similar

Using the cross-correlation formula above we can calculate the level of correlation between series.

$$corr(x, y) = \sum_{n=0}^{n-1} x[n]*y[n]$$

$$\begin{align}
corr(a, b) & = 1*2+2*3+-2*-2+4*3+2*2+3*4+1*1+0*-1 \\
& = 41
\end{align}$$

$$\begin{align}
corr(a, c) & =1*-2+2*0+-2*4+4*0+2*1+3*1+1*0+0*-2 \\
& =-5
\end{align}$$

Graphs A and B correlate, with a high value of 41.
Graphs A and C don’t correlate, having a low value of -5.

Normalized Cross-Correlation

There are three problems with cross-correlation:

  1.  It is difficult to understand the scoring value.
  2. Both metrics must have the same amplitude. If Graph B has the same shape as Graph A but values two times smaller, the correlation will not be detected.
    corr(a, a/2) =  19.5
  3. Due to the formula, a zero value will not be taken into account, since 0*0=0 and 0*200=0.

To solve these problems we use normalized cross-correlation:

$$norm\_corr(x,y)=\dfrac{\sum_{n=0}^{n-1} x[n]*y[n]}{\sqrt{\sum_{n=0}^{n-1} x[n]^2 * \sum_{n=0}^{n-1} y[n]^2}}$$

Using this formula let’s compute the normalized cross-correlation of AB and AC.

$$\begin{align}
norm\_corr(a,b) &= \dfrac{1*2+2*3+-2*-2+4*3+2*2+3*4+1*1+0*-1}{\sqrt{(1+4+4+16+4+9+1+0)*(4+9+4+9+4+16+1+1)}} \\
& = \dfrac{41}{\sqrt{(39)*(48)}} \\
& = 0.947
\end{align}$$

$$\begin{align}
norm\_corr(a,c) & =\dfrac{1*-2+2*0+-2*4+4*0+2*1+3*1+1*0+0*-2}{\sqrt{(1+4+4+16+4+9+1+0)*(4+0+16+0+1+1+0+4)}} \\
& =\dfrac{-5}{\sqrt{(39)*(26)}} \\
& =-0.157
\end{align}$$

Graphs A and B correlate, with a high value of 0.947.
Graphs A and C don’t correlate, showing a low value of -0.157.

  • Normalized cross-correlation scoring is easy to understand:
    – The higher the value, the higher the correlation is.
    – The maximum value is 1 when two signals are exactly the same:
    norm_corr(a,a)=1
    – The minimum value is -1 when two signals are exactly opposite:
    norm_corr(a, -a) = -1
  • Normalized cross-correlation can detect the correlation of two signals with different amplitudes: norma_corr(a, a/2) = 1.
    Notice we have perfect correlation between signal A and the same signal with half the amplitude!

Auto-Correlation

Auto-correlation is very useful in many applications; a common one is detecting repeatable patterns due to seasonality.

The following graph clearly shows repeating patterns every 8 data points. Indeed, looking at the R code, it’s a repeatable sequence of the numbers 1 through 8 with some random noise in the mix.

auto-correlation

Let’s compute the auto-correlation between the signal and itself at a time shift of 4 and time shift of 8. The following graphs clearly show a high auto-correlation at time shift 8, but not at time shift 4.

no-auto-correlation auto-correlation

To compute, we can use the same formula as cross-correlation (see above).

For a time shift of 8 the auto-correlation is higher than a time shift of 4. We have detected seasonality with a period of 8.

Normalized Auto-Correlation

We discussed earlier the advantages of normalized cross-correlation. In the same way, we can compute the normalized auto-correlation with time shifts of 4 and 8:

Normalized cross-correlation makes it very obvious that the signal repeats in a similar manner every 8 data points.

Correlation with Time Shift

All correlation techniques can be modified by applying a time shift. For example, it is very common to perform a normalized cross-correlation with time shift to detect if a signal “lags” or “leads” another.

To process a time shift, we correlate the original signal with another one moved by x elements to the right or left. Just as we did for auto-correlation.

To detect if two metrics are correlated with a time shift we need to compute all the possible time shifts. Fortunately, the R language can compute all the correlations with time shift very quickly.

Normalized Cross-Correlation with Time Shift

cross-correlation

ACF-lag

-4-3-2-101234
0.8620.021-0.547-0.4230.0000.8670.127-0.466-0.393

As we expected from the graph above, the metrics highly correlate with a time shift of 1.

Normalized Auto-Correlation with Time Shift

auto-correlation

auto-correlation with time shift

012345678910
1.0000.335-0.122-0.304-0.369-0.374-0.2260.1870.7890.306-0.120

The output above repeats every 8 datapoints. As expected, the auto-correlation detects a high correlation when the series is compared to itself at a time shift of 8.

Conclusion

Here at anomaly.io, we commonly use both cross-correlation and auto-correlation, which are building blocks to detecting unusual patterns in your data. As auto-correlation can detect the seasonality of a metric, we can apply a range of anomaly detection algorithms such as seasonal decomposition of time series or seasonally adjusting a time series. When a cross-correlation is found, we can detect anomalies when the correlation is broken between the series.

Be sure to also read “Detecting Correlation Among Time Series“.

Monitor & detect anomalies with Anomaly.io

SIGN UP
help with term papers