PATTERN RECOGNITION IN TIME-SERIES
By: Rick Martinelli, Haiku Laboratories, July 1995.
Note: This document was published in the January 1998 issue of Technical Analysis in Stocks & Commodities
THE CORRELATION COEFFICIENT
Pattern recognition is a general term that has been used to describe
a variety of different, but related, phenomena. The ability of a camera
and computer to discern a particular image in a visually noisy environment
is a classic example from engineering. This article is concerned with patterns
that appear in market data charts and that often precede other patterns
of interest, such as a sustained upward trend in price. The motivation
for this work came from the needs of market traders having large portfolios
of stocks who must search each of their charts for patterns that are currently
"setting up". Several commercial financial software packages
offer a pattern recognition feature whereby a chart can be searched for
a prescribed pattern. Patterns are specified in such a way that a particular
candidate segment is judged as either "having" the prescribed
pattern or not. For example, the software might search for segments having
2 down days (that is, lower closing prices than previous days) followed
by 3 up days. This approach may work for simple patterns, but for long
and/or complex patterns, these methods are difficult to employ. In contrast,
the method described in this article allows a pattern to be specified as
another chart-segment, of any length, provided it's shorter than the chart
data being analyzed, and provides a statistically rigorous measure of the
degree to which this segment resembles any other segment of the same length.
This measure is called the correlation coefficient for time-series. Using
this method, data may be scanned for a particular pattern and the results
displayed as a chart of correlations versus time.
One pattern that is thought to precede an up-trend, is the so-called "cup-and-handle" formation described by William O'Neil in , and also in . O'Neil claims that this pattern often reveals itself in a time segment that just precedes a sustained rise in daily closing prices for many stocks. In the example below a 21-day cup-and-handle pattern has been "clipped" from a one-year segment of IBM stock data, where it was seen to precede an up-trend in price (see Figure 1). A second pattern is a computer-generated function that "looks like" a cup, with and added "handle" formation. The original IBM data is scanned with both patterns to produce correlation charts, and it is seen that the charts are nearly the same. This is expected since the correlation coefficient is a robust statistic, i.e., small variations in the test pattern will not appreciably affect the outcome. These results show that standard, pre-made test patterns may be used instead of actual chart-segments, to scan charts for patterns.
THE CORRELATION COEFFICIENT
The correlation coefficient is a statistic that is used to measure "goodness-of-fit"
in many curve-fitting procedures, such as least-squares . Here we use
it as an indicator of fit, or similarity, between a user-selected chart-pattern,
and all segments of another chart having the same length. (See Statistics
Background). Values of this indicator range between 1.0 and -1.0, where
a value of 1 indicates a perfect match, i.e., the two patterns are identical.
A value of -1 would indicate that an exact match had been found, but that
it is "upside-down". Values near zero mean there is no match
at all. In practice it has been found that values of 0.8 or more correspond
to patterns in the data that are easily discerned as "good matches"
by the human eye. The correlation coefficient is also a normalized statistic,
which means that the actual numerical values of either the chart data points
or pattern values have no affect on its value; only the "shapes"
of the pattern and chart segments affect it. All of these features make
the correlation coefficient a good choice as an indicator of pattern matching.
The example presented here uses the daily closing prices of IBM from
02/07/94 through11/22/94 as the data time-series; a plot of the example
data is shown in Figure 1. All graphs and calculations have been done in
the Excel spreadsheet.
Figure 1. IBM daily closing prices for 264 trading days. The
segment enclosed in vertical lines is an example of a cup-and-handle pattern.
A pattern was selected from the data as indicated by the two vertical
lines on the plot. This pattern is intended to represent a 21-day cup with
handle. A more detailed view of this pattern, called IBMCUP, is shown in
Figure 4. The data in Figure 1 was scanned with IBMCUP, and the resulting
correlation chart is shown in Figure 2. Both Figures are plotted on the
same time-axis, and the peaks in Figure 2 occur at the center of the corresponding
data segment in Figure 1. There are five peaks in Figure 2 where correlation
exceeds 0.8. In each case, a cup formation may be visually discerned at
the same location in the corresponding data in Figure 1. The fourth of
these peaks attains a value of 1.0 indicating a perfect match. Of course,
this is exactly the segment that was used to make the IBMCUP pattern.
Figure 2. Daily correlations between IBMCUP and the data.
Figure 3. Daily correlations between MCUP and the data.
Figure 5 shows a fourth degree polynomial curve with an added handle, called MCUP, that is intended to be a smooth version of IBMCUP. The IBM data in Figure 1 was scanned for MCUP and the resulting correlation plot is shown in Figure 3. The close resemblance between Figure 3 and Figure 2 indicates that smooth patterns are just as effective in locating the cup-and-handle structure. Also notice that the different Y-values of the patterns have no effect on their resulting correlations.
Figure 4. IBMCUP, a 21-day cup-and-handlepattern from Figure 1.
Figure 5. MCUP, a smooth 21-day cup-and-handle pattern, consisting
of a polynomial curve with a 4-day handle.
The correlation coefficient statistic has been shown to be a useful
tool in the search for prescribed patterns in market charts. The approach
described here can be used by any trader with access to Excel, and access
to historical market data. This approach allows the use of a single pattern
to search an entire portfolio, rather than patterns scaled to each stock.
It also allows pre-made, smooth, "prototype" patterns to be used
in place of historical patterns taken from actual stocks. This means a
trader may keep a library of favorite patterns rather than prepare new
ones for each scan. A secondary advantage to using smooth patterns is their
generally higher correlation values due to the removal of the random fluctuations
present in historical patterns.
Lastly, some drawbacks of this approach should be pointed out. Although a spreadsheet program has been used here to analyze data and display results, automatic portfolio searches are difficult to implement with spreadsheets, and a separate computer program is probably needed. Also, although the correlation coefficient is fairly robust, it cannot compensate for different pattern lengths. Pattern lengths must be pre-specified by the trader to within a few time periods (usually days) or the correlation will decrease rapidly. Consequently, a collection of various lengths of the same pattern must often be used when scanning a chart. To avoid this, the new technique of wavelet analysis  may be used. This technique provides a 3-dimensiional correlation chart of a data set, where the length of a pattern constitutes one of those dimensions. The spectrum may be scanned, either by a human or computer, to find the pattern length yielding the largest correlation.
. O'Neil, W.J., How to make Money in Stocks, McGraw Hill, New York.
. Kuhn, Gregory J., "Back to Basics in Trading Stocks", Technical Analysis of Stocks & Commodities
. Merrill, Arthur A. , "Fitting a Trendline by Least Squares", Technical Analysis of Stocks & Commodities
. Chan, Y.T. , Wavelet Basics, Kluwer Academic Publishers,
Define a time-series as any set of numbers arranged in chronological order, with the same time interval between any neighboring pair of numbers. We write a time-series X of length N as
X1, X2, ... , XN.
The series X has a mean E(X), given by the average of its values
E(X) = ( X1 + ... + XN ) /
The mean is a measure of how far X is displaced from zero. Series X
also has a variance, V(X), given by:
V(X) = ( X12 + ... + XN2
) / N - (E(X))2
Mathematically, the variance of X is a measure of its size, after its
mean is removed. For a second time-series Y, also of length N, the covariance
between X and Y is similarly defined by:
COV(X,Y) = ( X1Y1 + ... + XNYN
) / N - E(X)E(Y).
The covariance provides a measure of the similarity between the two
series X and Y, and reaches its maximum value when Y and X are the same.
To remove the effect of the sizes of the two series, the covariance may
be normalized by dividing out their standard deviations, which are the
square-roots of their variances:
COV(X,Y) / SQRT[V(X)V(Y)]
This is the correlation coefficient. Most spreadsheets have a built-in
correlation function. In Excel it is called CORREL(array1,array2) and may
be used to produce correlation charts as follows. Assume that a 21-day
pattern is located in cells A1:A21, and 100 points of data are located
in cells B1:B100. Then, in cell C11 calculate,
CORREL( A$1:A$21, B1:B21 ).
This is the correlation between the pattern and the first 21days of data. Dragging this calculation down to cell C90 produces a column of correlations corresponding to the centers of successive 21-day "windows" on the data. To create a correlation chart, set cells C1:C10 and C91:C100 to zero and create a line chart for cells C1:C100.
NOTICE: This document has been published by Technical Analysis of Stocks & Commodities, January 1998.