Induced Correlation In StockMarket Data
By Rick Martinelli
Copyright, Haiku Laboratories Dec, 2011
INTRODUCTION
Daily stockmarket data is recorded for four prices: the open, the close, and the high and low prices of the day. A glance at a stock chart shows that the four individual price series are very similar, that is, highly correlated. More importantly, in many cases their price changes, or increments, are also highly correlated, both with each other, and with their own and the others’ past behavior. For example, Figure 1 shows the open and close increments for one quarter of SP500 data (63 days) ending 09 Sep 2011. The open series (blue) appears to be a good approximation of the close series that has been displaced one day to the right. This correlation between the two series is quantified in their lag1 crosscorrelation coefficient, the CC1; the CC1 for this data is 0.98. This high correlation is due to the fact that the gap series, the difference between yesterday’s close and today’s open is often very small compared with the average price of the stock. Figure 2 shows the gap series for the data in Figure 1; note the difference in the vertical scales.
Figure 1. Open price changes (blue) and close price changes for the SP500, one quarter (63 days) ending 09 Sep 2011.
Figure 2. The gap series for the data in Figure 1
In addition to the CC1, each individual increment series has associated with it a lag1 autocorrelation coefficient, the AC1, which measures the similarity between itself and a version of itself displaced by one day, as in figure 1. It has been shown that large AC1 values for increments can be used as indicators of trends in the corresponding price series [1]. To the extent that trending data is better suited for prediction, they may also serve as buy/sell indicators.
In the current study we show how large crosscorrelations between a stock’s increment series can be exploited to yield an averaged series having larger autocorrelation than any of its components; this is the ‘induced’ part in the title. We use as examples the SP500 data, part of which is shown in Figure 1, and for comparison, four Brownian motions. A Brownian motion is a series whose increments form a white noise; a white noise is an uncorrelated series with fixed variance. In the SP500 case, the openclose and 4averages both show an increased AC1; in the Brownian case, the AC1 decreases of each average decreases. Lastly, we scan about 5000 stock charts for the AC1’s of their opens, closes, and their OC and 4averages, and display the results as histograms.
COVARIANCE AND CORRELATION
Suppose X = X_{k} is a timeseries of length N representing a stock’s daily price, the price series, and write its increment series by Z = Z_{k} = X_{k} – X_{k1}. The lag1 autocovariance coefficient of the increments is defined by
C(Z) = <Z_{k}Z_{k1}> – <Z_{k}>^{2} ,
where the notation <…> means average over all possible values of k. The lag1 autocorrelation coefficient (AC1) of Z is a normalized version of C(Z) defined by
g(Z) = C(Z)/S(Z)^{2},
where S(Z)^{2} = <Z_{k}^{2}> ‑ <Z_{k}>^{2} is the variance of Z. The only difference between C(Z) and g(Z) is a scale factor; as such, the AC1 take values between 1 and 1 only, and inherits most of the properties of the autocovariance.
If W = W_{k} is a second increment series, the lag1 crosscovariance coefficient between Z and W is defined by
Ĉ(Z,W) = <Z_{k}W_{k1}> – <Z_{k}><W_{k}> ,
Note that Ĉ(Z,Z) = C(Z), and that Ĉ(Z,W) is in general not equal to Ĉ(W,Z). Lastly, the lag1 crosscorrelation coefficient (CC1) of Z and W is defined by
g(Z,W) = Ĉ(Z,W)/S(Z)S(W)
CC1’s also take values between 1 and 1 only.
Now consider the effect of averaging two or more increment series. For ease of calculation we assume Z and W each have mean zero and define their average by Y_{k} = (Z_{k} + W_{k}) / 2. Then,
C(Y) = (C(Z) + C(W) + Ĉ(Z, W) + Ĉ(W, Z)) / 4.
If Z and W are uncorrelated this reduces to C(Y) = (C(Z) + C(W)) / 4, so the effect of averaging in this case is to decrease the autocovariance. But if either of the crosscovariances Ĉ(Z, W) or Ĉ(W, Z) is relatively large and positive, then the autocovariance of the average can exceed that of the individual series. In a similar way the average of three or more series is again the average of the individual covariances plus all the possible crosscovariances, a total of 9 components for a 3average, and 16 for the 4average.
In what follows, Z and W represent any of the four daily increment series for a particular stock; the average of open, high, low and close is called the 4average, and the average of open and close is called the OCaverage. The 16 CC1’s for each stock may be arranged in a lag1 crosscorrelation matrix (see [2] Ch 8) as shown below.
SP500 EXAMPLE
We consider the lag1 crosscorrelation matrix for one year (252 data points) of SP500 increments, ending 09/09/11. Numeric values are shown in Table 1 and displayed in Figure 2. The main diagonal entries in Table 1 are AC1’s of the individual increment series for open, high, low and close. The first column contains the correlations between yesterday’s prices and today’s open, while the first row contains the correlations between today’s prices and yesterday’s open, etc. The tallest green column in Figure 3a (0.976) represents the CC1, between yesterday’s close and today’s open. Figure 3b shows the individual AC1’s from the diagonal, plus those of the 4average and OCaverage; note the dramatic increase in the averages over their components.

Open 
High 
Low 
Close 
Open 
0.146 
0.060 
0.052 
0.212 
High 
0.493 
0.184 
0.305 
0.084 
Low 
0.581 
0.423 
0.209 
0.058 
Close 
0.976 
0.492 
0.460 
0.158 
Table 1. AC1’s (diagonal) and CC1’s for
one year of SP500 ending 01/28/11.
Figure 3a. Graphical view of Table 1 information, With
the first row in blue, the last row in green, etc.
Figure 3b. Individual AC1’s for SP500 increments,
their OCaverage and their 4average.
BROWNIAN MOTION EXAMPLE
A classical model for stock data is a Brownian motion, that is, a series whose increments form a white noise. For our purposes, a white noise is a sequence of independent (so, uncorrelated) random numbers drawn from a normal probability distribution having mean zero and variance one, socalled N(0,1) numbers. As shown above, the effect of averaging two white noise series should be to decrease the AC1. Table 2 shows the CC1’s for a simulated stock using four Brownian motions, and Figure 4 shows the individual AC1’s along with their OCaverage and their 4average. Figure 4 shows that averaging produces a ‘whiter’ white noise than any of the series comprising the average, in terms of autocorrelation. Figures 3 and 4 show that SP500 clearly does not behave like a Brownian motion due to its large crosscorrelations.

Open 
High 
Low 
Close 
Open 
0.102 
0.006 
0.006 
0.032 
High 
0.041 
0.028 
0.040 
0.030 
Low 
0.041 
0.072 
0.147 
0.135 
Close 
0.018 
0.057 
0.065 
0.121 
Table 2. AC1’s (diagonal) and CC1’s
for four the Brownian motions
Figure 4. Individual AC1’s for the four Brownian
motions, their OCaverage and their 4average.
APPLICATION TO THE MARKETS
A database of 4849 stocks from the NYSE(1880), AMEX(321) and NASDAQ(2556), ending 11 May 2011 (1 year, 252 data points), was complied for this study. (All data was obtained from Yahoo Finance; the database was prescreened to eliminate stocks having zero volume on the last day.) The entire database was scanned for the AC1. Figure 5 shows the results as histograms of the (a) opens, (b) closes, (c) OC averages (d) 4averages, having means 0.072, 0.025, 0.220 and 0.204, respectively. The effect of averaging is seen in these histograms as a skewing of the AC1 to the right, implying most stocks in this database enjoy the crosscorrelation property seen in the SP500 data.
Figure 5. Histograms of the AC1’s of 4849 stocks, (a) the opens, (b) the closes, (c) the OCaverages and (d) the 4averages. Means are 0.025, 0.072, 0.220 and 0.204, respectively.
REFERENCES
[1] R. Martinelli, “TrendSpotting in the Stock Market”, Technical Analysis of Stocks and Commodities, v29:13, 811, 2011
[2] R. S. Tsay, Analysis of Financial Time Series, WileyInterscience, New Jersey, 2005