PATTERN RECOGNITION IN TIME-SERIES
By: Rick Martinelli, Haiku Laboratories, July 1995.

Note: This document was published in the January 1998 issue of Technical Analysis in Stocks & Commodities

INTRODUCTION
THE CORRELATION COEFFICIENT
AN EXAMPLE
CONCLUSIONS
REFERENCES
STATISTICS BACKGROUND

INTRODUCTION

Pattern recognition is a general term that has been used to describe a variety of different, but related, phenomena. The ability of a camera and computer to discern a particular image in a visually noisy environment is a classic example from engineering. This article is concerned with patterns that appear in market data charts and that often precede other patterns of interest, such as a sustained upward trend in price. The motivation for this work came from the needs of market traders having large portfolios of stocks who must search each of their charts for patterns that are currently "setting up". Several commercial financial software packages offer a pattern recognition feature whereby a chart can be searched for a prescribed pattern. Patterns are specified in such a way that a particular candidate segment is judged as either "having" the prescribed pattern or not. For example, the software might search for segments having 2 down days (that is, lower closing prices than previous days) followed by 3 up days. This approach may work for simple patterns, but for long and/or complex patterns, these methods are difficult to employ. In contrast, the method described in this article allows a pattern to be specified as another chart-segment, of any length, provided it's shorter than the chart data being analyzed, and provides a statistically rigorous measure of the degree to which this segment resembles any other segment of the same length. This measure is called the correlation coefficient for time-series. Using this method, data may be scanned for a particular pattern and the results displayed as a chart of correlations versus time.

One pattern that is thought to precede an up-trend, is the so-called "cup-and-handle" formation described by William O'Neil in [1], and also in [2]. O'Neil claims that this pattern often reveals itself in a time segment that just precedes a sustained rise in daily closing prices for many stocks. In the example below a 21-day cup-and-handle pattern has been "clipped" from a one-year segment of IBM stock data, where it was seen to precede an up-trend in price (see Figure 1). A second pattern is a computer-generated function that "looks like" a cup, with and added "handle" formation. The original IBM data is scanned with both patterns to produce correlation charts, and it is seen that the charts are nearly the same. This is expected since the correlation coefficient is a robust statistic, i.e., small variations in the test pattern will not appreciably affect the outcome. These results show that standard, pre-made test patterns may be used instead of actual chart-segments, to scan charts for patterns.

THE CORRELATION COEFFICIENT

The correlation coefficient is a statistic that is used to measure "goodness-of-fit" in many curve-fitting procedures, such as least-squares [3]. Here we use it as an indicator of fit, or similarity, between a user-selected chart-pattern, and all segments of another chart having the same length. (See Statistics Background). Values of this indicator range between 1.0 and -1.0, where a value of 1 indicates a perfect match, i.e., the two patterns are identical. A value of -1 would indicate that an exact match had been found, but that it is "upside-down". Values near zero mean there is no match at all. In practice it has been found that values of 0.8 or more correspond to patterns in the data that are easily discerned as "good matches" by the human eye. The correlation coefficient is also a normalized statistic, which means that the actual numerical values of either the chart data points or pattern values have no affect on its value; only the "shapes" of the pattern and chart segments affect it. All of these features make the correlation coefficient a good choice as an indicator of pattern matching.

AN EXAMPLE

The example presented here uses the daily closing prices of IBM from 02/07/94 through11/22/94 as the data time-series; a plot of the example data is shown in Figure 1. All graphs and calculations have been done in the Excel spreadsheet.

Figure 1. IBM daily closing prices for 264 trading days. The segment enclosed in vertical lines is an example of a cup-and-handle pattern.

A pattern was selected from the data as indicated by the two vertical lines on the plot. This pattern is intended to represent a 21-day cup with handle. A more detailed view of this pattern, called IBMCUP, is shown in Figure 4. The data in Figure 1 was scanned with IBMCUP, and the resulting correlation chart is shown in Figure 2. Both Figures are plotted on the same time-axis, and the peaks in Figure 2 occur at the center of the corresponding data segment in Figure 1. There are five peaks in Figure 2 where correlation exceeds 0.8. In each case, a cup formation may be visually discerned at the same location in the corresponding data in Figure 1. The fourth of these peaks attains a value of 1.0 indicating a perfect match. Of course, this is exactly the segment that was used to make the IBMCUP pattern.


Figure 2. Daily correlations between IBMCUP and the data.

Figure 3. Daily correlations between MCUP and the data.

Figure 5 shows a fourth degree polynomial curve with an added handle, called MCUP, that is intended to be a smooth version of IBMCUP. The IBM data in Figure 1 was scanned for MCUP and the resulting correlation plot is shown in Figure 3. The close resemblance between Figure 3 and Figure 2 indicates that smooth patterns are just as effective in locating the cup-and-handle structure. Also notice that the different Y-values of the patterns have no effect on their resulting correlations.

Figure 4. IBMCUP, a 21-day cup-and-handlepattern from Figure 1.

Figure 5. MCUP, a smooth 21-day cup-and-handle pattern, consisting of a polynomial curve with a 4-day handle.

CONCLUSIONS

The correlation coefficient statistic has been shown to be a useful tool in the search for prescribed patterns in market charts. The approach described here can be used by any trader with access to Excel, and access to historical market data. This approach allows the use of a single pattern to search an entire portfolio, rather than patterns scaled to each stock. It also allows pre-made, smooth, "prototype" patterns to be used in place of historical patterns taken from actual stocks. This means a trader may keep a library of favorite patterns rather than prepare new ones for each scan. A secondary advantage to using smooth patterns is their generally higher correlation values due to the removal of the random fluctuations present in historical patterns.

Lastly, some drawbacks of this approach should be pointed out. Although a spreadsheet program has been used here to analyze data and display results, automatic portfolio searches are difficult to implement with spreadsheets, and a separate computer program is probably needed. Also, although the correlation coefficient is fairly robust, it cannot compensate for different pattern lengths. Pattern lengths must be pre-specified by the trader to within a few time periods (usually days) or the correlation will decrease rapidly. Consequently, a collection of various lengths of the same pattern must often be used when scanning a chart. To avoid this, the new technique of wavelet analysis [4] may be used. This technique provides a 3-dimensiional correlation chart of a data set, where the length of a pattern constitutes one of those dimensions. The spectrum may be scanned, either by a human or computer, to find the pattern length yielding the largest correlation.


REFERENCES

[1]. O'Neil, W.J.[1988], How to make Money in Stocks, McGraw Hill, New York.

[2]. Kuhn, Gregory J.[1994], "Back to Basics in Trading Stocks", Technical Analysis of Stocks & Commodities

[3]. Merrill, Arthur A. [1994], "Fitting a Trendline by Least Squares", Technical Analysis of Stocks & Commodities

[4]. Chan, Y.T. [1995], Wavelet Basics, Kluwer Academic Publishers, Boston.


STATISTICS BACKGROUND

Define a time-series as any set of numbers arranged in chronological order, with the same time interval between any neighboring pair of numbers. We write a time-series X of length N as

X1, X2, ... , XN.

The series X has a mean E(X), given by the average of its values

E(X) = ( X1 + ... + XN ) / N.

The mean is a measure of how far X is displaced from zero. Series X also has a variance, V(X), given by:

V(X) = ( X12 + ... + XN2 ) / N - (E(X))2

Mathematically, the variance of X is a measure of its size, after its mean is removed. For a second time-series Y, also of length N, the covariance between X and Y is similarly defined by:

COV(X,Y) = ( X1Y1 + ... + XNYN ) / N - E(X)E(Y).

The covariance provides a measure of the similarity between the two series X and Y, and reaches its maximum value when Y and X are the same. To remove the effect of the sizes of the two series, the covariance may be normalized by dividing out their standard deviations, which are the square-roots of their variances:

COV(X,Y) / SQRT[V(X)V(Y)]

This is the correlation coefficient. Most spreadsheets have a built-in correlation function. In Excel it is called CORREL(array1,array2) and may be used to produce correlation charts as follows. Assume that a 21-day pattern is located in cells A1:A21, and 100 points of data are located in cells B1:B100. Then, in cell C11 calculate,

CORREL( A$1:A$21, B1:B21 ).

This is the correlation between the pattern and the first 21days of data. Dragging this calculation down to cell C90 produces a column of correlations corresponding to the centers of successive 21-day "windows" on the data. To create a correlation chart, set cells C1:C10 and C91:C100 to zero and create a line chart for cells C1:C100.


NOTICE: This document has been published by Technical Analysis of Stocks & Commodities, January 1998.