University of California, Santa Cruz

Department of Economics

Working Paper # 286, April 1994

JEL: C22, E32, O47

HOW SENSITIVE ARE ESTIMATED TRENDS TO DATA DEFINITIONS?

RESULTS FOR EAST ASIAN AND G-5 COUNTRIES

Yin-Wong Cheung, Menzie David Chinn

and Tuan Tran

Department of Economics

University of California

Santa Cruz, CA 95064

March 29, 1994

ABSTRACT: This paper examines whether test results characterizing per capita output as either trend or difference stationary are sensitive to whether output is valued in domestic currency terms, or in some international numeraire, such as the Summers and Heston (1991) international dollar. Using the conventional ADF test, and the Kwiatkowski et al. (1992) test with a trend stationary null, we find that for economies such as those of the East Asian countries, the best description of the persistence of the data does depend upon the valuation of output. No such discrepancy is found for the output series of the G-5 countries. We conclude that researchers should be extremely cautious about making generalizations regarding the time series properties of output.

JEL classification: C22, E32, O47

Acknowledgements: We thank the Pacific Rim Project of the University of California for financial support.

1. INTRODUCTION

The issue of whether aggregate output contains a unit root has occupied a central role in macroeconomic debate over the past decade. This concern has risen in tandem with a resurgence in interest in long run growth. As a consequence, the Penn World Table Mark 5 (PWT5) of Summers and Heston (1991) has become a fixture in empirical analyses of growth and convergence. The PWT5 data set attempts to control for differences in systems of national accounts and, more importantly, deviations from purchasing power parity, so that incomes can be compared across countries (or "interspatially") as well as intertemporally.

Recent work has made it apparent that the inferred time series characteristics of per capita output are dependent upon the data quality. Cheung and Chinn (1994) find that the higher the data quality, the more likely one is to be able to characterize data as either trend stationary or difference stationary. Since different imputation procedures are used for high quality data countries versus low ones, it may prove useful to investigate the sensitivity of test results to data definitions.

In this paper we focus on the Pacific Basin countries for two reasons. First these countries have been the center of much discussion about the sources of rapid economic growth. Second, we expect the difference between growth rates expressed in domestic currency and international dollar valuations to be most pronounced in countries experiencing rapid productivity growth and rapid changes in relative prices.

We apply two types of tests -- a test with a unit root null [the ADF test of Fuller (1976)] and a test with a trend stationarity null [the KPSS test of Kwiatkowski, Phillips, Schmidt and Shin (1992)] -- to data on the East Asian countries: China, Hong Kong, Indonesia, Korea, Malaysia, Philippines, Singapore, Taiwan and Thailand. We compare the results obtained from the PWT5 data set to those obtained from the conventional country sources. We then repeat the exercise for the G-5 countries (US, UK, Germany, Japan, Canada).

We find that for the East Asian countries one obtains very different statistical results depending upon the data set: more evidence for unit roots is found in the domestic currency series than in the PPP valued series. In contrast, for the developed countries the purchasing power parity imputation does not appear to make any substantial difference for statistical inferences.

This paper is organized as follows. In Section 2, we discuss the econometric methodology used. The data and the ADF and KPSS results are presented in Section 3. Section 4 concludes.

2. ECONOMETRIC METHODOLOGY

2.1. Overview

As mentioned in the introduction, GDP will be subjected to both the ADF and the KPSS tests. The results of these two tests are used to determine the nature of persistence in each GDP series.

The null and alternative hypotheses of the respective tests can be summarized as follows:

Test H₀: H_A:

ADF I(1) I(0)

KPSS I(0) I(1)

Hence, if the ADF test rejects the null while the KPSS test fails to reject the null, then this outcome is considered strong evidence in favor of a trend stationary process. If in contrast the ADF test fails to reject, while the KPSS test rejects, one has strong evidence in favor of a difference stationary process. The failure of both tests to reject could be attributed to the low power of the tests. Rejection of both null hypotheses could be due to a data generating process more complex than those considered.

2.2. The ADF Test

Let {y_t} be the GDP series. The ADF test for unit roots is based on the regression

The unit root null hypothesis is rejected if is significantly less than zero. While it is typical to conduct the tests using the standard critical values given in Fuller (1976) which control only for sample size, we choose to use the more appropriate finite sample critical values calculated from simulated distributions which control for both sample size and lag structure (see Appendix 1 for details).

2.3. Trend Stationarity Tests

Assume that the time series is the sum of a deterministic trend, a random walk, and a stationary error. The KPSS test is a Lagrange Multiplier test for the null hypothesis that the error variance in the random walk component of the series is zero.

To conduct the test, we first obtain the residual e_t from the regression of y_t on a constant and a trend. The KPSS statistics __t is then given by

where S_t is the partial sum process defined by

and s²(l) is the serial correlation and heteroskedasticity consistent variance estimator given by

w(s,l) is an optimal weighting function corresponding to the choice of a spectral window.

The null of trend stationarity is rejected in favor of the unit root alternative if the KPSS statistic is larger than the critical values provided by KPSS. Based on simulation results, KPSS assert that their test has good size and power characteristics. Since the KPSS test is a relatively new tool, we focus on the results based on the finite sample critical values (see Appendix 1 for details of how these values were generated). Following KPSS's suggestion, we adopt the l8 rule, which sets l=INT[8(T/100)^1/4] in equation (4).

3. DATA AND TEST RESULTS

3.1. Data Description

We use two types of output data. The first type is real per capita GNP or GDP, valued in real domestic currency terms. These data are drawn from either the IMF's International Financial Statistics or directly from the domestic statistical agencies. Such series are appropriate for intertemporal comparisons.

The second type is GDP per capita in real 1985 international dollar terms obtained from the Summers and Heston (1991) Penn World Tables, Mark 5. The variable (RGDPCH, using the Summers and Heston mnemonic) is calculated using a chain index to minimize base year problems.

Since the specifics of the Summers and Heston methodology for imputing international prices are not well known, it is useful to review the mechanics of the procedure. For the countries involved in the UN's International Comparison Project (ICP), prices of hundreds of identically specified goods and services are collected. The resulting price parities and subaggregate and total aggregate levels are used to convert the countries' national currency expenditures into a common numeraire. This common numeraire is defined such that US GDP in 1985 US$ and in 1985 international dollars are the same. In principle, this makes units of GDP comparable across countries as well as across time.

The sample period covers the post-War era for the developed countries, and somewhat shorter time periods for some of the less-developed countries, for the period up to 1988. The data span at most 39 observations, and at minimum 19 (Malaysia). These sample sizes may appear small, and hence likely to worsen the power of the ADF test. However as pointed out by Shiller and Perron (1985), the power of the tests for stationarity depend mainly on the length of the data time span and not on the number of observations. That is, the power of the test is essentially the same for both a sample containing 39 annual data points and a sample containing, say 39 x 12 monthly observations.

3.2. Overview of Econometric Results

The summary results are presented in the following format:

┌──────────────┬──────────────┐

│FAIL TO REJECT│ REJECT │

│KPSS │ KPSS │

┌───────────────────┼──────────────┼──────────────┤

│ FAIL TO REJECT │ Cell 1 │ Cell 2 │

│ ADF │ │ │

├───────────────────┼──────────────┼──────────────┤

│ REJECT │ Cell 3 │ Cell 4 │

│ ADF │ │ │

└───────────────────┴──────────────┴──────────────┘

Cell 1 reports the cases that both the ADF and KPSS tests fail to reject their respective null hypotheses. Cell 2 reports the cases that the ADF fails to reject the unit root null while the KPSS test rejects the trend stationary null in favor of the unit root hypothesis. Cell 3 reports the cases that the ADF rejects the unit root null in favor of the stationary alternative and the KPSS test fails to reject the trend stationary null. Cell 4 reports the cases that both the ADF and KPSS tests reject their respective null hypotheses. The rejections are based on the 5% significance level.

The interpretation of results from cell 2 and 3 is quite straightforward. Series that fall in cell 2 show a strong evidence of the presence of a unit root in the data while those in cell 3 show a strong evidence of trend stationarity. The cell 1 classification can be explained by the low power of both tests so that neither null is rejected: the data do not contain sufficient information to discriminate between the trend stationary and difference stationary hypotheses. On the other hand, the rejection of both the unit root and trend stationary null hypotheses, as in cell 4, cannot be attributed to the low power of one or both of the tests. One possible interpretation of such cases is that the data is governed by a more complex data generating process than either a deterministic trend or a unit root process.

3.3. Empirical Results

The summary results for the trend- and difference-stationarity tests for the East Asian countries are presented in Chart 1; the detailed results are in Table 1. We first discuss the conventional (domestic currency) valuation series. The most common outcome is "Fail to reject-fail to reject": Indonesia, Malaysia, Philippines, Singapore and Taiwan. Three series fail to reject the ADF null and reject the KPSS null (China, Korea and Thailand). Only the Hong Kong series falls into the apparently trend stationary category.

[Chart 1 about here]

[Table 1 about here]

One might expect that the time series characteristics of output series would be the same regardless of the valuation method -- that is a shock to output is permanent regardless of whether it is valued in domestic currency terms or international dollars. When using the Summers and Heston data set, we find that four out of the nine series have switched classification. Six series fall into the ambiguous "Fail to reject-fail to reject" category, including China and Korea, while the Philippines and Thailand now fall into the "Reject-fail to reject" category.

The results become more compelling when the conclusions are restated: if the two test results agree, then in the domestic currency series a difference stationary process is a more common finding, while in the Summers and Heston data set, the only unambiguous finding is that of trend stationarity.

To examine whether this contrast in results was common, we repeated the exercise for the G-5 countries (US, UK, Japan, Germany, and France) which presumably have the best price and quantity data. The results from this exercise are reported in Table 2. We find that the results do not differ between the two output series.

[Table 2 about here]

Next we examined whether the change in results for the Asian countries was due to the sensitivity of one or the other tests. The ADF switches from "fail to reject" to "reject" for the Philippines and Thailand when moving from domestic data to PWT5 data. In the latter case, the SIC chooses different lag. In the former, the same lag length is chosen, and yet the ADF statistics are very different (-2.02 versus -3.61). The KPSS switches from "reject" to "fail to reject" in the cases of China, South Korea and Thailand (using the same lag window). Thus both tests tend to provide more evidence of trend stationarity when output is valued at international prices.

Summers and Heston (1991) point out that the difference in calculated growth rates using either domestic currency or international dollars is most pronounced when there are drastic changes in relative prices. East Asian countries appear to be likely candidates for this effect. However, it is surprising that the mode of valuation affects the persistence characteristics of output. If indeed the Summers and Heston method does provide an adequate measure of "quantities" of production, this suggests that findings of difference stationary output in LDCs are likely due to persistent relative price changes.

4. CONCLUSIONS

In this paper we have assessed the persistence in output for nine East Asian countries, using a test with a unit root null, as well as a test with a trend stationary null. We use output data in both domestic real terms, as well as in international dollar terms.

We obtain several interesting results. First, for about two-thirds of the East Asian countries, the tests have inadequate power to reject their respective null hypotheses. Second, and more importantly, the finding of evidence in support of trend stationarity versus difference stationarity depends importantly on the type of data used. In general, output measured in a common international numeraire appears more trend stationary than its domestic currency counterpart. Both tests appear to detect this effect.

This set of results suggests that researchers should be careful about extrapolating results from one series to another. Moreover, it suggests caution in indiscriminately using either time series. For instance, while the Summers and Heston series may have the desirable attribute of providing a "real" quantity comparable across countries, it is important to remember that agents in these economies do not face "international" prices, and instead may base their decisions on domestic currency prices.

References

Cheung, Yin-Wong, and Menzie D. Chinn, 1994, "Deterministic, Stochastic, and Segmented Trends in Aggregate Output: A Cross-Country Analysis," GICES/Dept. of Economics Working Paper #282, University of California, Santa Cruz, February.

Cheung, Yin-Wong, and Kon S. Lai, 1993, "Lag Order and the Finite Sample Behavior of the Augmented Dickey-Fuller Test," GICES/Dept. of Economics Working Paper #269, University of California, Santa Cruz, September.

Fischer, Stanley, 1991, "Growth, Macroeconomics and Development," in O.J. Blanchard and Stanley Fischer (eds.) NBER Macroeconomics Annual, 1991, Cambridge: MIT Press, 329-363.

Fuller, Wayne, 1976, Introduction to Statistical Time Series, New York: John Wiley.

Hall, Alastair, 1994, "Testing for a Unit Root in Time Series with Pretest Data based Model Selection," Journal of Business and Economic Statistics April.

Kwiatkowski, Denis, P.C.B. Phillips, Peter Schmidt and Yongcheol Shin, 1992, "Testing the Null Hypothesis of Stationarity Against the Alternative of a Unit Root: How Sure Are We That Economic Time Series Have a Unit Root," Journal of Econometrics, 54, 159-178.

Levine, Ross and David Renelt, 1992, "A Sensitivity Analysis of Cross-Country Growth Regressions," American Economic Review, 82: 942-963.

Riezman, Raymond G. and Charles H. Whiteman, 1990, "Worldwide Persistence, Business Cycles, and Economic Growth," Social Science Working Paper #719, California Inst. of Technology, February.

Romer, Paul, 1990, "Capital, Labor, Productivity," Brookings Papers on Economic Activity: Microeconomics 1990, 337-369.

Shiller, Robert J., and Pierre Perron, 1985, "Testing the Random Walk Hypothesis: Power versus Frequency of Observation," Economics Letters, 18: 381-386.

Summers, R. and A. Heston, 1991, "The Penn World Table (Mark 5): An Expanded Set of International Comparisons," Quarterly Journal of Economics, May.

CHART 1

ADF AND KPSS RESULTS FOR EAST ASIAN COUNTRIES

┌──────────────────────────────────┐

│ Domestic Currency │

├────────┬───────────────┬─────────┤

│ │ FAIL TO REJECT│ REJECT │

│ │ KPSS │ KPSS │

├────────┼───────────────┼─────────┤

│ FAIL │ Indonesia │ China │

│ TO │ Malaysia │ Korea │

│ REJECT │ Philippines │ Thailand│

│ ADF │ Singapore │ │

│ │ Taiwan │ │

├────────┼───────────────┼─────────┤

│ REJECT │ Hong Kong │ │

│ ADF │ │ │

│ │ │ │

└────────┴───────────────┴─────────┘

┌──────────────────────────────────┐

│ Summers and Heston │

├────────┬───────────────┬─────────┤

│ │ FAIL TO REJECT│ REJECT │

│ │ KPSS │ KPSS │

├────────┼───────────────┼─────────┤

│ FAIL │ China │ │

│ TO │ Indonesia │ │

│ REJECT │ Korea │ │

│ ADF │ Malaysia │ │

│ │ Singapore │ │

│ │ Taiwan │ │

├────────┼───────────────┼─────────┤

│ REJECT │ Hong Kong │ │

│ ADF │ Philippines │ │

│ │ Thailand │ │

└────────┴───────────────┴─────────┘

Notes: Results using the 5% significance level. "Fail to reject ADF and fail to reject KPSS" indicates failure to reject both the unit root null and the trend stationary null hypotheses. "Fail to reject ADF and reject KPSS" indicates the failure to reject unit root null, but rejection of trend stationary null. "Reject ADF and fail to reject KPSS" indicates rejection of unit root null and failure to reject the trend stationary null. "Reject ADF and reject KPSS" indicates rejection of both the unit root and trend stationary null hypotheses. KPSS results refer to use of l8 rule.

APPENDIX 1

Description of Calculation of Finite Sample Critical Values

A. ADF Test Statistics

In generating the critical values controlling for both sample size and lag structure, a response surface analysis was used. This term applies to a system where the response of some variable depends on a set of other variables that can be controlled and measured in experiments. The surface is usually fitted by regression analysis.

Cheung and Lai (1993) report details of the simulation for the ADF critical values. The control variables are the sample size (N) and the lag parameter (k). A factorial experimental design is used, with 200 total combinations of N = {27, 30, 33, 36, 39, 42, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100, 150, 200, 250, 300, 350, 400, 500} and k = {1, 2, 3, 4, 5, 6, 7, 9}. The DGP is specified as:

with e_t distributed n.i.d. with mean zero. Each experiment contains 30,000 replications.

The following response surface is fitted:

where CR_N,k is the finite sample simulation estimate of the critical value for a sample size N and lag order k, T = N - k, the effective number of observations, and w_N,k the random error term.

The results are reported in Table A1.

B. The KPSS Finite Sample Test Statistics

The generation of the KPSS response surface follows in principle that set forth above for the ADF. The two variables controlled in the experiments are the total number of observations, N, and the lag truncation parameter, l. The DGP is specified as:

where ε is an error n.i.d. with mean zero. There are 30,000 replications in each experiment.

The following response surface is fitted:

Using the effective number of observations, T, does not make a substantial difference in the results. The estimated response surface is reported in Table A1.

TABLE A1

Response Surface Estimation of Critical Values

─────────────────────────────────────────────────────────────────

Reg ADF with Trend Reg KPSS

Coef 10% 5% Coef 10% 5%

─────────────────────────────────────────────────────────────────

τ₀ -3.1219* -3.4013* δ 0.1198* 0.1489*

(0.0021) (0.0025) (0.0001) (0.0001)

τ₁ -4.5243* -6.8786* Θ₁ -0.0397* -0.1986*

(0.2304) (0.2675) (0.0165) (0.0245)

Θ₂ -3.8261* -6.0331*

(0.5040) (0.7467)

φ₁ 1.0301* 1.2123* Ψ₁ -0.0211* -0.1320*

(0.0544) (0.0614) (0.0037) (0.0055)

φ₂ -0.9518* -1.7218* Ψ₂ 0.2665* 0.2459*

(0.1454) (0.1562) (0.0392) (0.0581)

Ψ₃ 1.2195* 1.9767*

(0.1099) (0.1629)

─────────────────────────────────────────────────────────────────

R² .8230 .8701 .9886 .9485

MSE .0003 .0004 4.9 x 10^-7 1.1 x 10^-6

─────────────────────────────────────────────────────────────────

Notes: The response surface for the ADF with trend is given by equation (A2). the response surface for the KPSS is given by equation (A4). Estimates for critical values are obtained from simulation with 30,000 replications. The numbers in parentheses are heteroskedasticity-consistent standard errors. Statistical significance at the 5% level is indicated by an asterisk (*).

MSE is mean squared error.