LEAST-SQUARES FORMULAS FOR NON-STATIONARY TIME-SERIES PREDICTION

by Rick Martinelli, Haiku Laboratories June 2008

Updated July 2011

Introduction

Linear Model

Cubic Model

Higher Degree Models

INTRODUCTION

The purpose of this memo is to derive some least-squares prediction formulas for time-series, for use with financial market data.  We are given a time-series {y(k)) | k = 1,…,n}, where the y(k) represent market data values sampled at a fixed time-interval, such as daily stock data.  We seek a least-squares prediction (estimate), y*(n+1) of y(n+1) as a linear combination of the previous n data values, i.e.,

y*(n+1) = α1y(1) + α2y(2) + … + αny(n),              (1a).

and the variance of the associated prediction error, or residual, estimate

e*(n+1) = y(n+1) − α1y(1) − α2y(2) − … − αny(n). (1b).

The usual least-squares formulas involve ordered pairs of data (x(k), y(k)).  However, for time-series data, x(k) = k and the least-squares formulas are somewhat simplified.  The data series y(k) is assumed to be composed of a “smooth” trend-line plus noise, and that short segments of the trend-line can be well-modeled by a low-degree polynomial.  In what follows, explicit prediction formulas are derived for linear, quadratic and cubic polynomial models over short data segments.  Formulas for prediction error estimates are also derived.  In the last section, polynomial models of higher degree are considered and a general formula is derived for any degree p > 0, where data length n = p + 1.

LINEAR MODEL

For a fixed data length n, and k = 1,…,n, assume the simple linear model

y(k) = kA + B + e(k)           (2)

where A and B are regression coefficients and e(k) represents the model error, or residual.  The normal equations for this model are (all sums are from k = 1 to n)

ky(k) = Ak2 + Bk,

y(k) = Ak + nB,

where A and B are functions of data length n.  These equations have solution

A = [nky(k) - ky(k)]/D(n),

B = [kky(k) + k2y(k)]/D(n),

where

D(n) = nk2(k)2.

Using the identities

k = n(n+1)/2  and  k2 = n(n+1)(2n+1)/6,

we have D(n) = n2(n2 - 1)/12, and the solution becomes (3a) (3b)

These are the general expressions for the regression coefficients A and B.  If y(k) = y0 for all k, i.e., all the data values are equal, then (3a) and (3b) reduce to A = 0 and B = y0, as required.

The regression coefficients are linear combinations of the data points y(k).  In the case n = 3, for example, equations (3) reduce to

A = (y(3) − y(1))/2

B = (4y(1) + y(2) − 2y(3))/3.

The prediction equation for model (2) may be written

y*(n+1) = (n+1)A + B,

and, upon substituting the general expressions for A and B given in (3), we have (4)

This is the desired form of the general predictor for the linear model, as in equation (1a).  When n = 3, for example, (4) reduces to

y*(4) =  [-2y(1) + y(2) + 4y(3)]/3

Coefficients for predictors on n points are summarized in Table 1 below for n = 2 to 7. The error variance V(n+1) of the prediction y*(n+1) may be estimated from

e*(n+1) = y(n+1) – y*(n+1).                   (5)

When n = 2, for example, the variance of the prediction y*(3) is given by

V(3) = [y(1) – 2y(2) + y(3)]2

This is the (square of the) deviation from linearity of the three successive points y(1), y(2), y(3).  In terms of the increments zk = yk – yk-1, V(3) may be written (z3 – z2)2.  In terms of the second differences wk = zk – zk-1, this is V(3) = w32.  Thus, if y is linear in the sense that the y(k)’s all fall on a straight line, then V(3) = 0.

When n = 3 the variance of the prediction y*(4) is given by

V(4) = [(2y(1) – y(2) – 4y(3) + 3y(4))/3]2 = [(-2z(2) – z(3) + 3z(4))/3]2

or V(4) = (w4 – 2w3)2/9.  Again, V(4) = 0 when y is linear because the second differences wk are then zero. Coefficients in the error estimates of y*(n+1) are summarized in Table 2 below for n = 2 to 7.

 n Predictor Equation Coefficients 2 -1, 2 3 -2/3, 1/3, 4/3 4 -1/2, 0, 1/2, 1 5 -2/5, -1/10, 1/5, 1/2, 4/5 6 -1/3, -2/15, 1/15, 4/15, 7/15, 2/3 7 -2/7, -1/7, 0, 1/7, 2/7, 3/7, 4/7

Table 1. Summary of the linear model predictor coefficients in the least-squares estimates for several small values of n, where coefficients are ordered from smallest to largest k.

 n Error Equation Coefficients 2 1, -2, 1 3 2/3, -1/3, -4/3, 1 4 1/2, 0, -1/2, -1, 1 5 2/5, 1/10, -1/5, -1/2, -4/5, 1 6 1/3, 2/15, -1/15, -4/15, -7/15, -2/3, 1 7 2/7, 1/7, 0, -1/7, -2/7, -3/7, -4/7, 1

Table 2. Summary of the linear model error coefficients for several small values of n, where coefficients are ordered from smallest to largest k.

Where the linear model above used a polynomial of degree p = 1, the quadratic model uses a degree p = 2 polynomial,

y(k) = k2A + kB + C + e(k),

where A, B and C are regression coefficients and e(k) represents the model error, and the corresponding prediction equation is

y*(n+1) = (n+1)2A + (n+1)B + C.

The normal equations for this model, in matrix form, are where all sums are from 1 to n.  Substituting the identities

k3 = n2(n + 1)2/4     and  k4 = n(n + 1)(2n + 1)(3n2 + 3n + 1)/30

and previous identities the matrix becomes .

Solving the resulting matrix equation for A, B and C yields the general expressions for the coefficients    Each of these expressions reduces to a linear combination of the data values.  For example, for n = 3 these coefficients simplify to

A = (y(1) − 2y(2) + y(3))/2

B = (-5y(1) + 8y(2) − 3y(3))/2

C = 3y(1) − 3y(2) + y(3)

and if n = 4 they are

A = (y(1) − y(2) − y(3) + y(4))/4

B = (-31y(1) + 23y(2) + 27y(3) − 19y(4))/20

C = (9y(1) − 3y(2) − 5y(3) + 3y(4))/4

In both cases note that A = B = 0 when all the data values are equal, and that C = y0, the common value.  Substituting the general expressions for A, B and C in the quadratic prediction equation gives  ,     (7)

This is the desired form of the general estimate for the quadratic model, as in equation (1a).  When n = 4, for example, the formula reduces to

y*(5) =  [3y(1) – 5y(2) – 3y(3) + 9y(4)]/ 4

(Note that formula (7) fails for n = 1 and 2.)  Coefficients for predictors on n points are summarized in Table 3 below for n = 2 to 7.  The error variance V(n+1) of the predictor is again estimated from the residual e*(n+1) given in (1b).  When n = 4, for example, the variance of the prediction Y*(5) is

V(5) = [–3y(1) + 5y(2) + 3y(3) – 9y(4) + 4y(5)]2/16

= [3z(2) – 2z(3) – 5z(4) + 4z(5)]2/16

In this case, the numerator measures the deviation of the successive points from a quadratic curve.

 n Predictor Coefficients 3 1, -3, 3 4 3/4, -5/4, -3/4, 9/4 5 3/5, -3/5, -4/5, 0, 9/5 6 1/2, -3/10, -3/5, -2/5, 3/10, 3/2 7 3/7, -1/7, -3/7, -3/7, -1/7, 3/7, 9/7 8 3/8, -3/56, -17/56, -3/8, -15/56, 1/56, 27/56, 9/8

Table 3. Coefficients for quadratic least-squares predictors on n points, ordered from smallest to largest k.

 n Error Coefficients 3 -1, 3, -3, 1 4 -3/4, 5/4, 3/4, -9/4, 1 5 -3/5, 3/5, 4/5, 0, -9/5, 1 6 -1/2, 3/10, 3/5, 2/5, -3/10, -3/2, 1 7 -3/7, 1/7, 3/7, 3/7, 1/7, -3/7, 9/7 , 1 8 -3/8, 3/56, 17/56, 3/8, 15/56, -1/56, -27/56, -9/8, 1

Table 4. Summary of the quadratic model error coefficients for several small values of n, where coefficients are ordered from smallest to largest k.

CUBIC MODEL

The cubic case uses a degree p=3 polynomial:

y(k) = A x(k)3 + B x(k)2 + C x(k) + D + e(k),

where A, B, C and D are to be estimated from the data, e(k) is the residual, and

y*(n+1) = A x(n+1)3 + B x(n+1)2 + C x(n+1) + D.

The normal equations for this model are    .

where all sums are 1,…,n.  As in the previous cases, these equations may be solved, this time with the help of two more identities and .

The final predictor y* is This is the predictor equation in the cubic case.  Substituting for n > 3 and evaluating provides coefficients for the class of cubic least-squares predictors on n points.  (Note the formula fails for n=1,2,3.)  When n = 4, for example, the formula reduces to

y*(5) =  –y(1) + 4y(2) – 6y(3) + 4y(4)

Results are summarized in Table 4 where coefficients are ordered as in Table 1.

 n Predictor Coefficients 4 -1, 4, -6, 4 5 -4/5, 11/5, -4/5, -14/5, 16/5 6 -2/3, 4/3, 1/3, -4/3, -4/3, 8/3

Table 4. Coefficients for cubic least-squares predictors on n points.

HIGHER DEGREE  POLYNOMIAL MODELS

From the Tables we see that, for each model and each n, the set of coefficients sums to 1, as it must.  Note also that, when data length n = p + 1 the formulas have binomial coefficients.  To see why, suppose the y(k) all lie on a straight line of slope m ≠ 0.  Then the first differences y(k+1) – y(k) are all equal, making the second differences zero. Thus

0 = [y(k+2) – y(k+1)] – [y(k+1) – y(k)]

which may be written

0 = y(k+2) – 2y(k+1) + y(k).

Note that the coefficients 1,-2,1 are the binomial coefficients in (a – b)2.  The predictor is thus

y*(k+2) = 2y(k+1) – y(k)

Similarly, if the y(k) all lie on a parabola, then the third differences are all zero, written

y(k+3) – 3y(k+2) + 3y(k+1) – y(k) = 0,

with the binomial coefficients in (a – b)3 and predictor

y*(k+3) = 3y(k+2) – 3y(k+1) + y(k).

In general, any polynomial model of degree p > 0 on equally-spaced data points will have binomial coefficients in its least-squares prediction and error formulas, when the number of points is n = p + 1.  The general prediction formula is: .

and the error estimate is 