LEASTSQUARES FORMULAS FOR NONSTATIONARY TIMESERIES PREDICTION
by Rick Martinelli, Haiku Laboratories June 2008
Copyright ©, 2008
Updated July 2011
The purpose of this memo is to derive some leastsquares prediction formulas for timeseries, for use with financial market data. We are given a timeseries {y(k))  k = 1,…,n}, where the y(k) represent market data values sampled at a fixed timeinterval, such as daily stock data. We seek a leastsquares prediction (estimate), y*(n+1) of y(n+1) as a linear combination of the previous n data values, i.e.,
y*(n+1) = α_{1}y(1) + α_{2}y(2) + … + α_{n}y(n), (1a).
and the variance of the associated prediction error, or residual, estimate
e*(n+1) = y(n+1) − α_{1}y(1) − α_{2}y(2) − … − α_{n}y(n). (1b).
The usual leastsquares formulas involve ordered pairs of data (x(k), y(k)). However, for timeseries data, x(k) = k and the leastsquares formulas are somewhat simplified. The data series y(k) is assumed to be composed of a “smooth” trendline plus noise, and that short segments of the trendline can be wellmodeled by a lowdegree polynomial. In what follows, explicit prediction formulas are derived for linear, quadratic and cubic polynomial models over short data segments. Formulas for prediction error estimates are also derived. In the last section, polynomial models of higher degree are considered and a general formula is derived for any degree p > 0, where data length n = p + 1.
For a fixed data length n, and k = 1,…,n, assume the simple linear model
y(k) = kA + B + e(k) (2)
where A and B are regression coefficients and e(k) represents the model error, or residual. The normal equations for this model are (all sums are from k = 1 to n)
∑ky(k) = A∑k^{2} + B∑k,
∑y(k) = A∑k + nB,
where A and B are functions of data length n. These equations have solution
A = [n∑ky(k)  ∑k∑y(k)]/D(n),
B = [∑k∑ky(k) + ∑k^{2}∑y(k)]/D(n),
where
D(n) = n∑k^{2} − (∑k)^{2}.
Using the identities
∑k = n(n+1)/2 and ∑k^{2} = n(n+1)(2n+1)/6,
we have D(n) = n^{2}(n^{2}  1)/12, and the solution becomes
_{} (3a)
_{} (3b)
These are the general expressions for the regression coefficients A and B. If y(k) = y0 for all k, i.e., all the data values are equal, then (3a) and (3b) reduce to A = 0 and B = y0, as required.
The regression coefficients are linear combinations of the data points y(k). In the case n = 3, for example, equations (3) reduce to
A = (y(3) − y(1))/2
B = (4y(1) + y(2) − 2y(3))/3.
The prediction equation for model (2) may be written
y*(n+1) = (n+1)A + B,
and, upon substituting the general expressions for A and B given in (3), we have
_{} (4)
This is the desired form of the general predictor for the linear model, as in equation (1a). When n = 3, for example, (4) reduces to
y*(4) = [2y(1) + y(2) + 4y(3)]/3
Coefficients for predictors on n points are summarized in Table 1 below for n = 2 to 7. The error variance V(n+1) of the prediction y*(n+1) may be estimated from
e*(n+1) = y(n+1) – y*(n+1). (5)
When n = 2, for example, the variance of the prediction y*(3) is given by
V(3) = [y(1) – 2y(2) + y(3)]^{2}
This is the (square of the) deviation from linearity of the three successive points y(1), y(2), y(3). In terms of the increments z_{k} = y_{k} – y_{k1}, V(3) may be written (z_{3} – z_{2})^{2}. In terms of the second differences w_{k} = z_{k} – z_{k1}, this is V(3) = w_{3}^{2}. Thus, if y is linear in the sense that the y(k)’s all fall on a straight line, then V(3) = 0.
When n = 3 the variance of the prediction y*(4) is given by
V(4) = [(2y(1) – y(2) – 4y(3) + 3y(4))/3]^{2 }= [(2z(2) – z(3) + 3z(4))/3]^{2}
or V(4) = (w_{4} – 2w_{3})^{2}/9. Again, V(4) = 0 when y is linear because the second differences wk are then zero. Coefficients in the error estimates of y*(n+1) are summarized in Table 2 below for n = 2 to 7.
n 
Predictor Equation Coefficients 
2 
1, 2 
3 
2/3, 1/3, 4/3 
4 
1/2, 0, 1/2, 1 
5 
2/5, 1/10, 1/5, 1/2, 4/5 


6 
1/3, 2/15, 1/15, 4/15, 7/15, 2/3 
7 
2/7, 1/7, 0, 1/7, 2/7, 3/7, 4/7 
Table 1. Summary of the linear model predictor coefficients in the leastsquares estimates for several small values of n, where coefficients are ordered from smallest to largest k.
n 
Error Equation Coefficients 
2 
1, 2, 1 
3 
2/3, 1/3, 4/3, 1 
4 
1/2, 0, 1/2, 1, 1 
5 
2/5, 1/10, 1/5, 1/2, 4/5, 1 
6 
1/3, 2/15, 1/15, 4/15, 7/15, 2/3, 1 
7 
2/7, 1/7, 0, 1/7, 2/7, 3/7, 4/7, 1 
Table 2. Summary of the linear model error coefficients for several small values of n, where coefficients are ordered from smallest to largest k.
Where the linear model above used a polynomial of degree p = 1, the quadratic model uses a degree p = 2 polynomial,
y(k) = k^{2}A + kB + C + e(k),
where A, B and C are regression coefficients and e(k) represents the model error, and the corresponding prediction equation is
y*(n+1) = (n+1)^{2}A + (n+1)B + C.
The normal equations for this model, in matrix form, are
_{}
where all sums are from 1 to n. Substituting the identities
∑k^{3} = n^{2}(n + 1)^{2}/4 and ∑k^{4} = n(n + 1)(2n + 1)(3n^{2} + 3n + 1)/30
and previous identities the matrix becomes
.
Solving the resulting matrix equation for A, B and C yields the general expressions for the coefficients
_{}_{}
_{}
_{}
Each of these expressions reduces to a linear combination of the data values. For example, for n = 3 these coefficients simplify to
A = (y(1) − 2y(2) + y(3))/2
B = (5y(1) + 8y(2) − 3y(3))/2
C = 3y(1) − 3y(2) + y(3)
and if n = 4 they are
A = (y(1) − y(2) − y(3) + y(4))/4
B = (31y(1) + 23y(2) + 27y(3) − 19y(4))/20
C = (9y(1) − 3y(2) − 5y(3) + 3y(4))/4
In both cases note that A = B = 0 when all the data values are equal, and that C = y_{0}, the common value. Substituting the general expressions for A, B and C in the quadratic prediction equation gives
_{}_{}, (7)
This is the desired form of the general estimate for the quadratic model, as in equation (1a). When n = 4, for example, the formula reduces to
y*(5) = [3y(1) – 5y(2) – 3y(3) + 9y(4)]/ 4
(Note that formula (7) fails for n = 1 and 2.) Coefficients for predictors on n points are summarized in Table 3 below for n = 2 to 7. The error variance V(n+1) of the predictor is again estimated from the residual e*(n+1) given in (1b). When n = 4, for example, the variance of the prediction Y*(5) is
V(5) = [–3y(1) + 5y(2) + 3y(3) – 9y(4) + 4y(5)]^{2}/16
= [3z(2) – 2z(3) – 5z(4) + 4z(5)]^{2}/16
In this case, the numerator measures the deviation of the successive points from a quadratic curve.
n 
Predictor Coefficients 
3 
1, 3, 3 
4 
3/4, 5/4, 3/4, 9/4 
5 
3/5, 3/5, 4/5, 0, 9/5 
6 
1/2, 3/10, 3/5, 2/5, 3/10, 3/2 
7 
3/7, 1/7, 3/7, 3/7, 1/7, 3/7, 9/7 
8 
3/8, 3/56, 17/56, 3/8, 15/56, 1/56, 27/56, 9/8 
Table 3. Coefficients for quadratic leastsquares predictors on n points, ordered from smallest to largest k.
n 
Error Coefficients 
3 
1, 3, 3, 1 
4 
3/4, 5/4, 3/4, 9/4, 1 
5 
3/5, 3/5, 4/5, 0, 9/5, 1 
6 
1/2, 3/10, 3/5, 2/5, 3/10, 3/2, 1 
7 
3/7, 1/7, 3/7, 3/7, 1/7, 3/7, 9/7 , 1 
8 
3/8, 3/56, 17/56, 3/8, 15/56, 1/56, 27/56, 9/8, 1 
Table 4. Summary of the quadratic model error coefficients for several small values of n, where coefficients are ordered from smallest to largest k.
The cubic case uses a degree p=3 polynomial:
y(k) = A x(k)^{3} + B x(k)^{2} + C x(k) + D + e(k),
where A, B, C and D are to be estimated from the data, e(k) is the residual, and
y*(n+1) = A x(n+1)^{3} + B x(n+1)^{2} + C x(n+1) + D.
The normal equations for this model are
_{}
_{}
_{}
_{}.
where all sums are 1,…,n. As in the previous cases, these equations may be solved, this time with the help of two more identities
_{}
and
_{}.
The final predictor y* is
_{}
This is the predictor equation in the cubic case. Substituting for n > 3 and evaluating provides coefficients for the class of cubic leastsquares predictors on n points. (Note the formula fails for n=1,2,3.) When n = 4, for example, the formula reduces to
y*(5) = –y(1) + 4y(2) – 6y(3) + 4y(4)
Results are summarized in Table 4 where coefficients are ordered as in Table 1.
n 
Predictor Coefficients 
4 
1, 4, 6, 4 
5 
4/5, 11/5, 4/5, 14/5, 16/5 
6 
2/3, 4/3, 1/3, 4/3, 4/3, 8/3 
Table 4. Coefficients for cubic leastsquares predictors on n points.
HIGHER DEGREE POLYNOMIAL MODELS
From the Tables we see that, for each model and each n, the set of coefficients sums to 1, as it must. Note also that, when data length n = p + 1 the formulas have binomial coefficients. To see why, suppose the y(k) all lie on a straight line of slope m ≠ 0. Then the first differences y(k+1) – y(k) are all equal, making the second differences zero. Thus
0 = [y(k+2) – y(k+1)] – [y(k+1) – y(k)]
which may be written
0 = y(k+2) – 2y(k+1) + y(k).
Note that the coefficients 1,2,1 are the binomial coefficients in (a – b)^{2}. The predictor is thus
y*(k+2) = 2y(k+1) – y(k)
Similarly, if the y(k) all lie on a parabola, then the third differences are all zero, written
y(k+3) – 3y(k+2) + 3y(k+1) – y(k) = 0,
with the binomial coefficients in (a – b)^{3} and predictor
y*(k+3) = 3y(k+2) – 3y(k+1) + y(k).
In general, any polynomial model of degree p > 0 on equallyspaced data points will have binomial coefficients in its leastsquares prediction and error formulas, when the number of points is n = p + 1. The general prediction formula is:
_{}.
and the error estimate is
_{}