Induced Correlation In Stock-Market Data

By Rick Martinelli

Copyright, Haiku Laboratories Dec, 2011



Daily stock-market data is recorded for four prices: the open, the close, and the high and low prices of the day.  A glance at a stock chart shows that the four individual price series are very similar, that is, highly correlated.  More importantly, in many cases their price changes, or increments, are also highly correlated, both with each other, and with their own and the others’ past behavior.  For example, Figure 1 shows the open and close increments for one quarter of SP500 data (63 days) ending 09 Sep 2011.  The open series (blue) appears to be a good approximation of the close series that has been displaced one day to the right.  This correlation between the two series is quantified in their lag-1 cross-correlation coefficient, the CC1; the CC1 for this data is 0.98.  This high correlation is due to the fact that the gap series, the difference between yesterday’s close and today’s open is often very small compared with the average price of the stock.  Figure 2 shows the gap series for the data in Figure 1; note the difference in the vertical scales.


Figure 1. Open price changes (blue) and close price changes for the SP500, one quarter (63 days) ending 09 Sep 2011.

Figure 2. The gap series for the data in Figure 1


In addition to the CC1, each individual increment series has associated with it a lag-1 auto-correlation coefficient, the AC1, which measures the similarity between itself and a version of itself displaced by one day, as in figure 1.  It has been shown that large AC1 values for increments can be used as indicators of trends in the corresponding price series [1].  To the extent that trending data is better suited for prediction, they may also serve as buy/sell indicators. 


In the current study we show how large cross-correlations between a stock’s increment series can be exploited to yield an averaged series having larger auto-correlation than any of its components; this is the ‘induced’ part in the title.  We use as examples the SP500 data, part of which is shown in Figure 1, and for comparison, four Brownian motions.  A Brownian motion is a series whose increments form a white noise; a white noise is an uncorrelated series with fixed variance.  In the SP500 case, the open-close and 4-averages both show an increased AC1; in the Brownian case, the AC1 decreases of each average decreases.   Lastly, we scan about 5000 stock charts for the AC1’s of their opens, closes, and their OC- and 4-averages, and display the results as histograms. 



Suppose X = Xk is a time-series of length N representing a stock’s daily price, the price series, and write its increment series by Z = Zk = Xk – Xk-1.  The lag-1 auto-covariance coefficient of the increments is defined by


C(Z) = <ZkZk-1> – <Zk>2 ,


where the notation <> means average over all possible values of k.  The lag-1 auto-correlation coefficient (AC1) of Z is a normalized version of C(Z) defined by


g(Z) = C(Z)/S(Z)2,


where S(Z)2 = <Zk2> ‑ <Zk>2 is the variance of Z.  The only difference between C(Z) and g(Z) is a scale factor; as such, the AC1 take values between -1 and 1 only, and inherits most of the properties of the auto-covariance.   


If W = Wk is a second increment series, the lag-1 cross-covariance coefficient between Z and W is defined by


Ĉ(Z,W) = <ZkWk-1> – <Zk><Wk> ,


Note that Ĉ(Z,Z) = C(Z), and that Ĉ(Z,W) is in general not equal to Ĉ(W,Z).  Lastly, the lag-1 cross-correlation coefficient (CC1) of Z and W is defined by


g(Z,W) = Ĉ(Z,W)/S(Z)S(W) 


CC1’s also take values between -1 and 1 only.


Now consider the effect of averaging two or more increment series.  For ease of calculation we assume Z and W each have mean zero and define their average by Yk = (Zk + Wk) / 2.  Then,


C(Y) = (C(Z) + C(W) + Ĉ(Z, W) + Ĉ(W, Z)) / 4.


If Z and W are uncorrelated this reduces to C(Y) = (C(Z) + C(W)) / 4, so the effect of averaging in this case is to decrease the auto-covariance.  But if either of the cross-covariances Ĉ(Z, W) or Ĉ(W, Z) is relatively large and positive, then the auto-covariance of the average can exceed that of the individual series.  In a similar way the average of three or more series is again the average of the individual covariances plus all the possible cross-covariances, a total of 9 components for a 3-average, and 16 for the 4-average. 


In what follows, Z and W represent any of the four daily increment series for a particular stock; the average of open, high, low and close is called the 4-average, and the average of open and close is called the OC-average.  The 16 CC1’s for each stock may be arranged in a lag-1 cross-correlation matrix (see [2] Ch 8) as shown below.


We consider the lag-1 cross-correlation matrix for one year (252 data points) of SP500 increments, ending 09/09/11.  Numeric values are shown in Table 1 and displayed in Figure 2.  The main diagonal entries in Table 1 are AC1’s of the individual increment series for open, high, low and close.  The first column contains the correlations between yesterday’s prices and today’s open, while the first row contains the correlations between today’s prices and yesterday’s open, etc.  The tallest green  column in Figure 3a (0.976) represents the CC1, between yesterday’s close and today’s open. Figure 3b shows the individual AC1’s from the diagonal, plus those of the 4-average and OC-average; note the dramatic increase in the averages over their components.




























Table 1. AC1’s (diagonal) and CC1’s for

one year of SP500 ending 01/28/11. 


Figure 3a. Graphical view of Table 1 information, With

 the first row in blue, the last row in green, etc.


Figure 3b. Individual AC1’s for SP500 increments,

their OC-average and their 4-average.


A classical model for stock data is a Brownian motion, that is, a series whose increments form a white noise.  For our purposes, a white noise is a sequence of independent (so, uncorrelated) random numbers drawn from a normal probability distribution having mean zero and variance one, so-called N(0,1) numbers.  As shown above, the effect of averaging two white noise series should be to decrease the AC1.  Table 2 shows the CC1’s for a simulated stock using four Brownian motions, and Figure 4 shows the individual AC1’s along with their OC-average and their 4-average. Figure 4 shows that averaging produces a ‘whiter’ white noise than any of the series comprising the average, in terms of auto-correlation.  Figures 3 and 4 show that SP500 clearly does not behave like a Brownian motion due to its large cross-correlations. 





























Table 2. AC1’s (diagonal) and CC1’s

for four the Brownian motions




Figure 4. Individual AC1’s for the four Brownian

motions, their OC-average and their 4-average.



A database of 4849 stocks from the NYSE(1880), AMEX(321) and NASDAQ(2556), ending 11 May 2011 (1 year, 252 data points), was complied for this study. (All data was obtained from Yahoo Finance; the database was pre-screened to eliminate stocks having zero volume on the last day.)  The entire database was scanned for the AC1.  Figure 5 shows the results as histograms of the (a) opens, (b) closes, (c) OC averages (d) 4-averages, having means -0.072, -0.025, 0.220 and 0.204, respectively.   The effect of averaging is seen in these histograms as a skewing of the AC1 to the right, implying most stocks in this database enjoy the cross-correlation property seen in the SP500 data.



Figure 5.  Histograms of the AC1’s of 4849 stocks, (a) the opens, (b) the closes, (c) the OC-averages and (d) the 4-averages.  Means are -0.025, 0.072, 0.220 and 0.204, respectively.




[1] R. Martinelli, “Trend-Spotting in the Stock Market”, Technical Analysis of Stocks and Commodities, v29:13, 8-11, 2011

[2] R. S. Tsay, Analysis of Financial Time Series, Wiley-Interscience, New Jersey, 2005