VAR priors and economic theory

VAR Priors : Success or lack of a decent macroeconomic theory ?

Francisco F. R. Ramos

Faculty of Economics, University of Porto, 4200 Porto, Portugal.

Phone: +351-(0)2-5509720, Fax: +351-(0)2-5505050

E-mail: framos@fep.up.pt

Abstract:

The purpose of this paper is to demonstrate that the success of the Litterman prior in VAR forecasting is not due to the realism of the prior, but rather because the prior conveniently reduces forecast error variance in common cases of misspecification. Specifically, it is shown that the imposition of a random walk prior reduces forecast error variance in misspecifications involving (1) time-varying coefficients misspecified as constant coefficients, (2) serially correlated residuals misspecified as white noise, and (3) the inclusion of an irrelevant unit root process in the VAR.

Classification system for journal articles: C32, C53.

Key words: BVAR, Forecasting performance, Litterman prior, Misspecification

Random-walk prior, VAR.

1. Introduction

Perhaps the most successful application of vector autoregression methods (VAR) in macroeconomics has been in the area of economic forecasting. Since 1980, VAR forecasts produced by R. Litterman at the Federal Reserve Bank of Minneapolis have consistently outperformed forecasts generated from conventional Keynesian macroeconometrics models . The forecasting superiority of VAR's is particularly striking at long forecast horizons, and persists even after forecasts from conventional models have benefited from judgmental adjustement by their model builders. The enviable forecasting record compiled by Litterman strongly reflects the imposition of Bayesian prior restrictions on the VAR. Although the Bayesian prior Litterman has developed for VAR forecasting is not derived from economic theory, users of this methodology claim that the Litterman prior is successful because it is a realistic way of expressing uncertainty about the problem of predicting macroeconomics time series.

This paper critically examines the proposition that Bayesian priors for VAR foracasting are successful because they are realistic. We contend that their lack of realism, combined with the lack of a decent macroeconomic theory is the reason for the success of the Litterman prior. Through a series of theoretical examples, we demonstrate that classical VAR forecasting leads to high forecast error variance but that the forecast error variance is reduced substantially by the imposition of the Litterman random walk prior. The second section of this paper briefly reviews the Litterman random walk prior. In the third section, we develop three examples, which we argue are typical of actual macroeconomics models, that show how the "unrealistic" random walk prior reduces forecast error variance. The paper closes with a brief summary and conclusion.

2. A quick review of Litterman random walk prior

In a series of papers, Litterman (1979, 1986a,1986b) has demonstrated that the imposition of a Bayesian random walk prior on VAR's substantially improves forecast accuracy. Through a series of forecasting comparisons, he shows that Bayesian VAR (BVAR) models are competitive with large scale addfactored commercial models such as DRI, Wharton Econometrics, and Chase Econometrics. Given Fair's demonstration (1979) that unrestricted VAR's can produce forecast errors that are four times greater than the errors from conventional models, Litterman's forecast record is very impressive.

Following Litterman, a VAR model of order p consists of a system of equations relating n times series . Each equation in a VAR model can be viewed as a multiple regression equation in which is regressed on p lags of each of the variables in the system. Thus, the ith equation of a BVAR model can be written as

(1)

In (1), is a vector of observations on the dependent variable; is the known data matrix, consisting of lagged values of ; is a vector of coefficient parameters; and is a vector of random disturbances. The error terms, , are assumed to be normally distributed, so that

(2)

Note that if the error terms are assumed to be comtemporaneously, but not serially, correlated, then the system of n equations in the VAR can be viewed as a system of Seemingly Unrelated Regression (SUR) equations; see Zellner(1962).

The BVAR approach restricts a VAR model by incorporating available information about the coefficients of the model into the estimation procedure.The prior information takes the form of stochastic constraints on the coefficient parameters. For each coefficient parameter, the prior information is used to supply a point estimate of the mean of the parameter and an estimate of the parameter's variation about its mean. The prior information used in the specification of the BVAR model is based on two general premises, rather than explicitly on economic theory. This represents a departure from the practice associated with structural econometric models, in which model restrictions are derived explicitly on economic theory. In the first premise, Litterman (1986b) suggests that a realistic approximation of the movement over time of many macroeconomic variables is a random walk around an unknown deterministic component. He suggests that this prior should be imposed on all equations in the VAR. For the ith equation this distribution is centered around the specification

(3)

The second premise is that more information may be derived about a variable at time t from recent lags of the variables in the system than from more distant lags. An implication of the first premise is that prior point estimates for the coefficients are all taken to be zero, except for the first lag on the dependent variable in each equation in the system, which is given a prior mean of 1. Adherence to the second premise means that the variation about the prior means is assumed to diminish as the lag length increases, with the result that as the lag length increases, the prior distributions of the coefficients become tighter about a mean of 0. In addition, the prior has larger standard deviations on lag coefficients of the dependent variable than on other variables in the system. A non-informative prior is imposed on the deterministic component, since it is suggested that little is known about the distribution of this parameter.

An important factor in the design of BVAR models is that the prior estimates of the coefficient parameters variation are constructed in a relatively mechanistic manner, indexed by two or more "hyperparameters". This permits a fairly general specification of prior information regarding the coefficient parameters in the model, based on relatively few prior parameters. This approach greatly facilitates the use of a model which incorporates information, in a Bayesian fashion, about the coefficients. A fully Bayesian approach to the estimation of the VAR model would require the specification of a prior distributional form (e.g., mean and variance, assuming normal priors, or appropriate non-informative priors) for each parameter in the model. Note that a VAR model including six lags on seven variables would have 296 coefficient parameters, seven constant terms, and 28 elements in the covariance structure of the error terms, for a total of 331 free parameters. A BVAR specification for the same model can be based on the specification of as few as two hyperparameters. The spacification of the prior information included in the BVAR model is admittedly less general than that which can be included in a fully Bayesian approach, but the BVAR model appears to offer a practical compromise, in terms of ease of both specification and estimation, between an unrestricted VAR and a fully Bayesian model. The literature on the elicitation of prior information in terms of distributional forms and the necessary parameters of these distributions for use in a Bayesian framework shows that the specification of parameters of prior distributions, even by those with considerable expertise and familiarity with the models under consideration, can be quite difficult. The BVAR approach to model specification makes these models considerably more attractive by circumventing much of the need to specify individual prior parameters.

With the imposition of this prior, external information is allowed to enter each equation at the margin, but it is important to recognize that the prior will never let the equations of the BVAR deviate too far from independent random walk with drift. Although it is claimed that the prior is a realistic one, it should be noted that it is not derived from any explicit economic theory.

While Litterman has exploited the use of this prior in generating unconditional (nonstructural) macroeconomic forecasts, the prior has also been used to draw structural inference from macroeconomic time series. Doan, Litterman and Sims (1984), use this prior in making conditional forecasts, while Sims(1982, 1986) suggests that the prior is useful in assessing macroeconomic policies. To justify the use of this prior in examining structural equations, these authors argue that the Litterman prior is a realistic one for analyzing macroeconomic time series, and in particular, may reveal empirical regularities that remain hidden to standard procedures.

3. Theoretical examples

In this section, we argue that the success of BVAR forecasting does not depend on the realism of the Litterman prior. Instead, we suggest that classical VAR forecasting under common situations of model misspecification leads to high forecast error variance, and that the judicious use of a random walk prior in this situations improves forecasting performance.

Specifically, we develop three theoretical examples which show how the "unrealistic" Litterman prior improves forecasting performance in misspecified VAR's. The three sources of misspecification we consider are (1) constant coefficient estimation when the true coefficients are time-varying, (2) estimation under the assumption of white noise errors when the true disturbances are serially correlated, and (3) inclusion of a spurious variable which contains a unit root. Given the lack of consensus in macroeconomic theory, these sources of misspecification are likely to be common in macroeconomic modeling. Moreover, they all can significantly affect the consistency and ( or) efficiency properties of OLS estimation in VAR's.

Lucas (1976), and Belsey and Kuhn (1973), have suggested that the usual fixed coefficient specification used in macroeconomic modeling will often be rejected in favor of time-varying parameter structures. They both point out that the benefits for improved macroeconomic forecasts resulting from adjusting equation intercepts, or "add-factoring", directly reflects coefficient variability. Rosenberg (1973), demonstrates that this type of misspecification has significant statistical implications. He shows that estimation of fixed coefficients when the true model contains stochastic coefficients results in inefficient estimation. Specifically, OLS error variance rises to fives times the efficient variance, and OLS sampling theory underestimates OLS error variance by a factor of at least 20.

The problem of serially correlated disturbance terms is recognized to be extremely common in macroeconomic models. Most VAR studies assume white noise errors, since autoregressions of sufficient order should produce serially uncorrelated residuals. Models with moving average errors, however, theoretically may require an infinite number of autoregressive parameters to deliver white noise errors. Therefore, it is not clear that the lag lengths commonly employed in VAR's will completely eliminate serial correlation in the residuals . Moreover, OLS estimates are biased and inconsistent in models with lagged dependent variables and serially correlated errors [Griliches, (1967)]. In addition, Griliches (1967) argues that it can be quite difficult to distinguish between distributed lag models, such as VAR's, and models with serially correlated errors. In the absence of prior identifying restrictions, these two cases may be observationally equivalent.

Inclusion of irrelevant variables in a VAR will likely be common since VAR's do not employ exclusion restrictions. The specific case considered in this paper is the inclusion of an irrelevant variable with a unit root. This is an interesting process to consider since unit roots appear to be very common among aggregate time series [Nelson and Plosser (1982)], and nonstationary time series can lead to problems in estimation and hypothesis testing. In particular, standard estimation and testing procedures are probably not useful in examining linear restrictions in a VAR which are not co-integrated with the rest of the model [Sims, Stock and Watson, (1990)].

3.1 Time varying parameters.

In this example, we analyze the effect of misspecifying time varying coefficients as fixed coefficients on VAR forecast error.

Consider the following model:

(4.1)

(4.2)

(4.3)

and .

Estimating (4.1) by OLS yields:

(4.4)

The k-step ahead forecast is:

(4.5)

The k-step ahead forecast error is:

(4.6)

Decomposing the forecast error, we first analyze :

(4.7)

can now be represented as:

(4.7a)

Since is a covariance stationary stochastic process,

(4.7b)

now, decomposing :

= (4.8)

Therefore, (4.6) becomes :

(4.9)

Recall that . The best forecast for with d known is

(4.10)

substituting yields:

(4.11)

rearranging (4.11), we may now write the forecast error as:

Now, decomposing the forecast error

(4.12a)

+ (4.12b)

+ (4.12c)

Analyzing the first component of (4.12c) :

This term explodes in k unless goes to zero and goes to k less than infinity or goes to 1 faster than goes to infinity. Constraining to 1 (random walk) when it is greater than 1 reduces this term and inequivocally reduces mean square forecast error. If is less than 1, constraining it to be 1 increases the mean forecast error, but this is less serious because the error is nonexplosive.The second term of (4.12c) , is reduced by setting to zero. The third term of (4.12c) is explosive, but no model parameters will reduce it.

3.2 Serially correlated residuals.

In this example, we analyze the effect of serial correlation misspecification on VAR forecast error. In particular, we examine the case in which substantial serial correlation in the error term is mistaken for serial correlation in the 's.

Consider the following autoregression with an AR(1) disturbance term:

(5.1)

(5.2)

(5.3)

but instead, white noise errors are assumed:

(5.4)

While (4.1) can be written as an autoregression with a white noise error term, it will require the following second order terms and nonlinear restrictions:

(5.1a)

A simple comparison between (5.1a) and (5.4) clearly indicates that ignoring serially correlated residuals significantly changes the structure of the VAR. In particular, note that the assumption of white noise errors in (5.1a) requires a coefficient estimate of , rather than a , for optimal prediction of future Y. As Griliches (1967) has pointed out, OLS estimation of (5.4) yields a biased and inconsistent estimate of a . The large sample bias in has been calculated by Griliches (1967) as:

(5.5)

As long as , and a is less than one in absolute value, will overestimate a . This seems to be a fortunate result from a forecasting viewpoint, since we wish to estimate rather than a . To determine if the OLS estimate a approximates , we calculate a using the following parameter values:

Since our interest in this example is to determine the effect of substantial serial correlation misspecification on forecast error, we choose a and c to be positive, with a small relative to c . Note, however, that is stationary, but close to the unit root specification Nelson and Plosser (1982) have found to be common in aggregate time series. Similarly, we choose also to be stationary but with a root close to the unit circle. Thus, will reasonably approximate many macroeconomic time series. The assumption that is arbitrary; the larger the variance in relative to , the smaller the large sample bias in .

Calculating under these parameter values yields . Thus, the OLS estimate is biased substantially upwards from the true value of 0.15 , but it is significantly below the optimal value for prediction of .

Consider the imposition of the Litterman prior on (5.4). First, note setting approximately equal one is much closer to the optimal value of 0.95 for the coefficient on the first lag of the dependent variable than the OLS estimate of . Second, note that pushing other terms to zero does not seriously increase forecast error. To see this, write (5.1a) as:

(5.1b)

For the parameter values we choose, note that is close to zero (0.12), and the term is also close to zero (0.05), so a prior that sets these terms approximately to zero is not damaging. Other information ignored under the Litterman prior is , the innovation. In contrast, note that OLS estimation of (5.4) yields not only a downward biased estimate of , but will incorrectly estimate the influence of the 's on since it ignores the second order term from (5.1a) , . The BVAR model will outperform the VAR in this case, since the random walk prior is closer to the true value than the OLS estimate , and since OLS estimation of the VAR ignores more information about the 's than the BVAR, .

This example illustrates that the imposition of the Litterman prior in a VAR with serial correlation misspecification improves forecast accuracy relative to OLS estimation. Specifically, the random walk prior reduces coefficient bias from misspecification.

While the specific results of this example will not necessarily generalize, we can make the following observations. First, in VAR's with highly serially correlated residuals, the OLS estimate of the coefficient on the lagged dependent variable will be biased and inconsistent, and may not be optimal from a forecasting viewpoint. In this case, the imposition of a unit coefficient on the lagged dependent variable will correspond more closely to the coefficient for optimal prediction than the OLS estimate. Second, solving for serial correlation in VAR's, as was done in (5.1a), results in higher order autoregressions with nonlinear restrictions. For VAR's with more general orders of serial correlation, reducing the disturbance terms to white noise may significantly complicate the structure of the model. In these cases, a prior of pushing other coeficients to zero may be less harmful than attempting to estimate the additional parameters associated with higher order models and nonlinearities.

3.3 Inclusion of an irrelevant unit root.

In this example, we analyze the effect of including an irrelevant unit root process on VAR forecast error. We choose a unit root variable since it is well known that nonstationary time series can lead to problems in estimation and inference, and since unit root specifications appear to be good approximations for many macroeconomics time series (Nelson and Plosser, 1982).

Consider the following (true) model:

(6.1)

Instead, the estimated model includes an irrelevant random walk, :

(6.1a)

(6.2)

For stationary , standard testing procedures are fine for testing whether in (6.1a). However, testing is more difficult when contains a unit root. While Sims, Stock, and Watson (1990) demonstrate that standard estimation and testing procedures are valid in many situations with nonstationary, but co-integrated regressors, tests for Granger-causality and neutrality (testing for in (6.1a) ) will have nonstandard limiting distributions if the model is not co-integrated, as in (6.1a). In these cases, classical tests will fail to reject too often.

Given that we may include with a non-zero coefficient in (6.1a), the forecast error for grows without bound because of the presence of the unit root. The k-step ahead forecast conditional on information at from (6.1a) is given by:

(6.3)

We also have:

(6.4)

Subtracting the forecast of (6.3) from (6.4), and focusing on the spurious random walk , we obtain:

(6.5)

Since is conditional on information at , the forecast error (6.5) is equal to

(6.6)

The variance of (6.6) is equal to . This term is unbounded in k , and results in forecast error variance of growing without bound. The imposition of the Litterman prior, however, will reduce forecast error variance, because it pushes the coefficient b to zero.

4. Summary and Conclusions.

The purpose of this paper was to demonstrate that the success of the Litterman prior in VAR forecasting is not due to the realism of the prior, but rather because the prior conveniently reduces forecast error variance in common cases of misspecification. Specifically, it was shown that the imposition of a random walk prior reduces forecast error variance in misspecifications involving (1) time-varying coefficients misspecified as constant coefficients, (2) serially correlated residuals misspecified as white noise, and (3) the inclusion of an irrelevant unit root process in the VAR.

Given the lack of consensus in macroeconomic theory and the minimal structure imposed in VAR's, it seems to us that these and other types of misspecification will often be present in empirical macroeconomic models. Therefore, this analysis suggests that the Litterman prior is a useful way of restricting VAR's for economic forecasting. The substantial value added from the prior in generating forecasts, however, should not be mistaken as a signal of the prior's realism. Rather, it should indicate to the forecaster that specification uncertainty is a fundamental component of forecast error in macroeconomic models, and that a random walk prior is a useful approach in these type of situations.

REFERENCES

Belsey, D.A. and E. Kuh, 1973, Time-varying parameters structures: An overview , Annals of Economic and Social Measurement 2, 375-380.

Doan, T., R. Litterman and C.Sims, 1984, Forecasting and conditional projection using realistic prior distributions, Econometric Reviews 3, 1-100.

Fair, R., C., 1979, An analysis of the accuracy of four macroeconomics models, Journal of Political Economy 87, 701-718.

Griliches, Z., 1967, Distributed lags: A survey, Econometrica 35, 16-49.

Litterman, R., 1979, Techniques for forecasting using vector autoregressions, Ph.D. Thesis, University of Minnesota.

Litterman, R.,1986a, Specifying vector autoregressions for macroeconomic forecasting, in: P. Goel and A. Zellner, eds., Bayesian Inference and Decision Techniques with applications: Essays in Honor of Bruno De Finetti (North-Holland, Amsterdam) 79-84.

Litterman, R., 1986b, Forecasting with bayesian vector autoregressions: Five years of experience, Journal of Business and Economic Statistics 4, 25-37.

Lucas, R. E., 1976, Econometric Policy Evaluation: A Critique, in: K. Brunner and A. Meltzer, eds., The Phillips Curve and Labor Markets, 19-46.

McKnees, S., 1986, Forecasting accuracy of alternative techniques: A comparison of U.S. macroeconomic forecasts, Journal of Business and Economic Statistics 4, 5-16.

Nelson, C. R and C.I. Plosser, 1982, Trends and random walks in macroeconomic time series: Some evidence and implications, Journal of Monetary Economics 10, 139-162.

Rosenberg, B., 1973, A survey of stochastic parameter regression, Annals of Economic and Social Measurement 2, 381-397.

Sims, C., 1980, Macroeconomics and reality, Econometrica 41, 775-788.

Sims, C., 1982, Policy Analysis with Econometric Models, Brookings Papers on Economic Activity, 107-152.

Sims, C., 1986, Are forecasting models useable for policy analysis ? , Federal Reserve Bank of Minneapolis Quarterly Review, Winter, 1-10.

Sims, C., J. H. Stock, and M. W. Watson, 1990, Inference in linear time series with some unit roots, Econometrica 58, 113-144.

Zellner, A., 1962, An efficient method of estimating seemingly unrelated regressions and tests for aggregation bias, Journal of the American Statistical Association 57, 348-368.