BOOTSTRAP METHODS FOR COVARIANCE STRUCTURES
by
Joel L. Horowitz
Department of Economics
University of Iowa
Iowa City, IA 52242
October 1996
ABSTRACT
The optimal minimum distance (OMD) estimator for models of covariance structures is asymptotically efficient but has much worse finite-sample properties than does the equally-weighted minimum distance (EWMD) estimator. This paper shows how the bootstrap can be used to improve the finite-sample performance of the OMD estimator. The theory underlying the bootstrap's ability to reduce the bias of estimators and errors in the coverage probabilities of confidence intervals is summarized. The results of numerical experiments and an empirical example show that the bootstrap often essentially eliminates the bias of the OMD estimator. The finite-sample estimation efficiency of the bias-corrected OMD estimator often exceeds that of the EWMD estimator. Moreover, the true coverage probabilities of confidence intervals based on the OMD estimator with bootstrap critical values are very close to the nominal coverage probabilities.
I thank Joseph Altonji for attracting my attention to the problem addressed in this paper and for making his data available. Gene Savin provided comments on an earlier draft. This research was supported in part by NSF grant SBR-9307677.
BOOTSTRAP METHODS FOR COVARIANCE STRUCTURES
1. INTRODUCTION
Estimates of covariance structures are important in the analysis of a variety of economic processes. See, for example, Abowd and Card (1987, 1989), Behrman et al. (1994), Griliches (1979), and Hall and Mishkin (1982). Estimates of covariance structures can be obtained by minimizing the weighted distance between sample moments and the estimated population moments. Weighting all sample moments equally produces the equally-weighted minimum distance (EWMD) estimator, whereas choosing the weights to maximize asymptotic estimation efficiency produces the optimal minimum distance (OMD) estimator.
The OMD estimator dominates the EWMD estimator in terms of asymptotic efficiency, but it has been found to have poor finite-sample properties in applications (Abowd and Card 1989). Altonji and Segal (1994, 1996) carried out an extensive Monte Carlo investigation of the finite-sample performance of the OMD estimator. They found that the estimator is badly biased with samples of the sizes often found in applications and that its finite-sample root-mean-square estimation error (RMSE) often greatly exceeds the RMSE of the asymptotically inefficient EWMD estimator. Altonji and Segal also found that the true coverage probabilities of asymptotic confidence intervals based on the OMD estimator tend to be much lower than the nominal coverage probabilities. Thus, estimation and inference based on the OMD estimator can be highly misleading with finite samples.
This paper shows how the bootstrap can be used to overcome these problems. The bootstrap is a method for estimating the distribution of an estimator or test statistic by resampling one's data. It amounts to treating the data as if they were the population for the purpose of evaluating the distribution of interest. If the data were, in fact, the population, the coverage probability of a confidence interval and the bias of an estimator could be computed with arbitrary accuracy by Monte Carlo simulation. Moreover, subtracting the bias from the estimator would yield an unbiased estimator with increased finite-sample efficiency. Since, the data are not the population, the bootstrap provides only an approximation to the coverage probability of a confidence interval or the bias of an estimator. It turns out, however, that the bootstrap approximation is often more accurate than the approximation obtained from asymptotic distribution theory. As a result, the bootstrap often provides estimates of coverage probabilities of confidence intervals that are more accurate than estimates obtained from asymptotic distribution theory. Moreover, the use of the bootstrap to reduce the bias of an estimator often produces very large increases in finite-sample estimation efficiency.
Section 2 of this paper explains why the bootstrap often provides approximations that are more accurate than the approximations of asymptotic distribution theory. Section 3 presents the results of Monte Carlo experiments in which the bootstrap is applied to the estimation of covariance structures using simulated data. In experiments in which the kurtosis of the sampled population is low, the bootstrap essentially eliminates the finite-sample bias of the OMD estimator and the errors in the coverage probabilities of confidence intervals based on the OMD estimator. Moreover, with low-kurtosis populations, the finite-sample RMSE of the bias-corrected OMD estimator is less than that of the EWMD estimator. The bootstrap is less successful in removing bias and correcting coverage probabilities when the kurtosis of the sampled population is high. Section 4 shows that this problem can be overcome by modifying the OMD estimator in a way that improves the estimate of the OMD weight matrix. In the Monte Carlo experiments with high-kurtosis populations, application of the bootstrap to the modified OMD estimator produces estimates with little bias and RMSE's that are equal to or smaller than those of the EWMD estimates. The true coverage probabilities of confidence intervals obtained from the bootstrap and the modified OMD estimator are very close to the nominal probabilities. Section 5 presents an empirical example based on the data used by Altonji and Segal (1994, 1996). Section 6 presents concluding comments.
Sections 2c, 2d, and 4b include step-by-step instructions for implementing the bootstrap procedures that are described. These instructions may be carried out without detailed study of the theoretical arguments that justify them.
2. THE THEORY OF THE BOOTSTRAP
This section explains how the bootstrap can be used to reduce the bias of an estimator and why the bootstrap often provides an approximation to the coverage probability of a confidence interval that is more accurate than the approximation of asymptotic distribution theory. Most of the material in this section is not new, and the discussion is informal. Mathematically rigorous treatments are available in Beran (1988), Beran and Ducharme (1991), and Hall (1986, 1992).
To provide focus for the discussion, Section 2a provides a brief description of the EWMD and OMD estimators of covariance structures. See Abowd and Card (1989) and Altonji and Segal (1996) for more detailed presentations. Section 2b describes the bootstrap sampling and estimation procedures. The use of the bootstrap to reduce bias is explained in Section 2c. Section 2d explains how the bootstrap can be used to obtain an improved approximation to the coverage probability of a confidence interval.
a. The EWMD and OMD estimators
Let X be a l´1 random variable with mean m and covariance matrix V. In general, V has l(l + 1)/2 unique elements. However, some of these elements may be assumed to equal zero a priori. Let q £ l(l + 1)/2 be the number of unique elements of V that are not assumed to equal zero. In the applications of interest in this paper, further a priori restrictions on the covariance structure of X reduce the number of unique, non-zero elements of V to r < q. These restrictions set some elements of V equal to others but do not specify their numerical values. For example, the l components of X may be assumed to have equal but unspecified variances. Let q denote the r´1 vector of unique elements of V under the a priori restrictions that are imposed, excluding any elements of V that are assumed to equal zero. The statistical problem of interest in this paper is to obtain point and interval estimates of q.
To obtain the EWMD and OMD estimators, let {Xi: i = 1,...,n} be a random sample of X. Let Vn denote the sample covariance matrix:
n
-1
V = (n - 1) S (X - )(X - )', (2.1)
n i i
i=1
where = n-1Si1Xi. Let S be a q´1 vector containing the unique elements of Vn corresponding to elements of V that are not a priori zero. Let e be the q´r matrix of zeros and ones that maps elements of q into elements of S. Then eq = S, and e'e = Ir´r, the r´r identity matrix. The EWMD and OMD estimators of q have the form
-1
q = (e'W e) e'W S, (2.2)
n n n
where Wn is a q´q, positive-definite matrix. The EWMD estimator, qn,EWMD, is obtained by setting Wn = Iq´q, the q´q identity matrix. The OMD estimator, qn,OMD, is obtained by setting Wn equal to the inverse of an estimator of the population covariance matrix of S. One such estimator is Wn = Sn-1, where Sn is a matrix with elements of the form
n
-1 (r) (s) (t) (u)
n S (X - ) (X - ) (X - ) (X - ) (2.3)
i i i i
i=1
n n
é -1 (r) (s)ùé -1 (t) (u)ù
- ên S (X - ) (X - ) úên S (X - ) (X - ) ú,
ë i i ûë i i û
i=1 i=1
(Xi - )(r) denotes the r'th element of Xi - , and (r,s) and (t,u) are the indices of non-zero elements of V.
The EWMD estimator is unbiased, though inefficient, because S is an unbiased estimator of the unique, nonzero elements of V. The OMD estimator is biased in finite samples, though it is asymptotically unbiased and efficient, because the OMD version of Wn is a random variable that is not independent of S. The bias of the OMD estimator is a source of finite-sample (but not asymptotic) inefficiency. The finite-sample efficiency of the OMD estimator could be increased if its bias could be reduced without greatly increasing its variance. Section 2c shows how this can be accomplished with the bootstrap.
The EWMD and OMD estimators are both asymptotically normally distributed with means of zero and covariance matrices that can be estimated consistently. Accordingly, each can be used to construct asymptotic confidence intervals for components of q. However, Monte Carlo evidence presented by Altonji and Segal (1994, 1996) shows that the true coverage probabilities of asymptotic confidence intervals based on the OMD estimator often are far below the nominal coverage probabilities. Section 2d shows how the bootstrap can be used to alleviate this problem.
b. The Bootstrap Sampling and Estimation Procedures
The bootstrap treats the estimation data as if they were the population. Bootstrap "data" are generated by sampling the estimation data randomly with replacement. Let {Xi*: i = 1,..., n} be a bootstrap sample that is obtained this way. A bootstrap OMD estimator of q is an estimator that is obtained from the bootstrap sample and that mimics the distributional properties of the OMD version of qn,OMD. An obvious way to obtain such an estimator is to replace {Xi} with {Xi*} in (2.1)-(2.3). This method is unsatisfactory, however.
To see why, observe that the population value of q is identified by the moment condition E(S - eq) = 0, which holds when q = q0, the population value of q, but not otherwise. In bootstrap sampling, the population expectation, E, is replaced by E*, the expectation with respect to the probability distribution that places mass 1/n at each data point Xi. The population parameter value q0 is replaced by qn,OMD. It is not difficult to show that E*S* = [(n - 1)/n]S. Therefore, the bootstrap analog of the population moment condition that identifies q is E*(S* - eq) = [(n - 1)/n]S - eq = 0. It is clear from (2.2), however, that [(n - 1)/n]S - eqn,OMD ¹ 0 except, possibly, in special cases. Therefore, replacing {Xi} with {Xi*} in (2.1)-(2.3) yields a bootstrap estimator of q that is based on a moment condition that does not hold in the population that the bootstrap samples. As a result, this version of the bootstrap does not provide reductions in bias or in the differences between the true and nominal coverage probabilities of confidence intervals.
This problem can be solved by recentering the bootstrap moment condition so that it holds at q = qn,OMD. The recentered moment condition is
E*(S* - eq - R ) = 0, (2.4)
n
where Rn = [(n - 1)/n]S - eqn,OMD. OMD estimation of q based on (2.4) yields the following bootstrap OMD estimator:
-1
q * = (e'W *e) e'W *(S* - R ), (2.5)
n,OMD n n n
where Wn* is obtained from (2.3) by replacing Xi with Xi* and with * = n1Si1Xi*.
c. Using the Bootstrap to Reduce Bias
This section shows how the bootstrap can be used to reduce the finite-sample bias of qn,OMD.
It follows from (2.2) that
-1
q - q = (e'W e) e'W (S - eq )
n,OMD 0 n n 0
Let qnj,OMD and q0j, respectively, denote the j'th components of qn,OMD and q0. Then qnj,OMD - q0j can be written in the form
q - q = g (Z)(S - eq ), (2.6)
nj,OMD 0j j 0
where Z is a vector consisting of components of X and products of up to four components of X, and gj is a differentiable 1´q vector-valued function. Let mZ = E(Z), and let Gj(·) be the q´q matrix of derivatives of gj with respect to the components of Z. Under regularity conditions, a Taylor series expansion of (2.6) yields
q - q = g (m )(S - eq) + (Z - m )'G (m )(S - eq ) + e ,
nj,OMD 0j j Z Z j Z 0 nj
where enj is a remainder term with the property that E(enj) = o(n-1) as n ® ¥. Therefore,
-1
E(q - q ) = E[(Z - m )'G (m )(S - eq )] + o(n ). (2.7)
nj,OMD 0 Z j Z 0
Thus, through O(n-1) the bias of qnj,OMD is
B º E[(Z - m )'G (m )(S - eq )]
j Z j Z 0
= E[(Z - m )'G (m )S]. (2.8)
Z j Z
If Bj were known, qnj,OMD - Bj would be a reduced-bias OMD estimator of q0j.
Of course, Bj is not known in applications, but it can be estimated by the bootstrap. To do this, observe that the bootstrap analog of (2.6) is
q * - q = g (Z*)(S* - eq - R ),
nj,OMD nj,OMD j n,OMD n
where Z* is a vector of bootstrap sample moments. The bootstrap estimate of the bias of qnj,OMD is Bj* = E*(qnj,OMD* - qnj,OMD). The bootstrap reduced-bias OMD estimator of q0j is qnj,OMD - Bj*. Bj* can be computed with arbitrary accuracy in an application by the following Monte Carlo procedure:
B1. Use the estimation data to compute qnj,OMD.
B2. Generate a bootstrap sample of size n by sampling the estimation data randomly with replacement. Compute qnj,OMD* by applying (2.5) to the bootstrap sample.
B3. Compute E*qnj,OMD* by averaging the results of many repetitions of step B2. Set Bj* = E*qnj,OMD* - qnj,OMD.
It remains to determine how Bj* and Bj are related. A Taylor series argument similar to that leading to (2.7) shows that under regularity conditions
-1
B * = E*[(Z* - )'G ()(S* - eq - R )] + o(n )
j j n,OMD n
-1
= E*[(Z* - )'G ()S*] + o(n ) (2.9)
j
almost surely. By comparing (2.8) with (2.9), it can be seen that the only differences between Bj and the leading term of Bj* are that replaces mZ in Bj*, and the empirical expectation, E*, replaces the population expectation, E. This observation together with arguments based on the strong law of large numbers lead to the result that Bj* = Bj + o(n1) almost surely. Therefore, use of the bootstrap bias estimate Bj* almost surely provides the same bias reduction that would be obtained if the infeasible population value Bj could be used. This is the source of the bootstrap's ability to reduce the bias of the OMD estimator.
d. The Bootstrap and Confidence Intervals
This section explains how the bootstrap can be used to reduce the difference between the true and nominal coverage probabilities of a confidence interval for q0j.
Let snj be the standard error of qnj,OMD. The t statistic based on qn,OMD for testing the hypothesis H0: qj = q0j is Tnj = (qnj,OMD - q0j)/snj. Let tna be the exact, finite-sample critical value of a symmetrical t test of H0 based on Tnj. Then tna is the 1 - a quantile of the finite-sample distribution of ½Tnj½. An exact 1 - a symmetrical confidence interval for q0j is
q - t s £ q £ q + t s . (2.10)
nj,OMD na na 0j nj,OMD na na
This confidence interval cannot be computed in applications because tna is unknown. A feasible asymptotic 1 - a confidence interval can be obtained by replacing tna in (2.10) with the asymptotic critical value of the symmetrical t test. The asymptotic critical value is the 1 - a/2 quantile of the standard normal distribution, which will be denoted by za/2. The resulting asymptotic confidence interval is
q - z s £ q £ q + z s . (2.11)
nj,OMD a/2 na 0j nj,OMD a/2 na
Asymptotic distribution theory insures that the true coverage probability of (2.11) approaches 1 - a, the nominal coverage probability, as n ® ¥. However, the Monte Carlo results of Altonji and Segal (1994, 1996) show that the true and nominal coverage probabilities can be very different with samples of the sizes encountered in applications. The bootstrap alleviates this problem by providing a feasible critical value that is more accurate than the asymptotic one.
To see how the bootstrap does this, let the exact finite-sample cumulative distribution function (CDF) of Tnj be Hn(z,F) º P(Tnj £ z), where F is the CDF of X. Let Fn be the empirical distribution function (EDF) of the estimation data {Xi}. The finite-sample critical value tna satisfies Hn(tna,F) - Hn(-tna,F) = 1 - a. Under regularity conditions, Hn has an Edgeworth expansion of the form
-1/2 -1 -1
H (z,F) = F(z) + n p (z,F) + n p (z,F) + o(n ) (2.11)
n 1 2
uniformly over z, where F is the standard normal CDF, p1 and p2 are functionals of (z,F), p1(z,F) is an even function of z for each F, p2(z,F) is an odd function of z, and p2(z,Fn) ® p2(z,F) almost surely as n ® ¥ uniformly over z. It follows from (2.11) and the symmetry of F, p1, and p2 that for any z > 0
-1 -1
P(½T ½ > z) = 2[1 - F(z) - n p (z,F)] + o(n ). (2.12)
nj 2
Now consider the bootstrap. The bootstrap samples a population whose CDF is Fn. Let Tnj* be the t statistic based on qnj,OMD* for testing the bootstrap hypothesis H0*: qj = qnj,OMD. Tnj* = (qnj,OMD* - qnj,OMD)/snj*, where snj* is the standard error that is obtained by replacing sample quantities with their bootstrap analogs in the formula for snj. Denote the CDF of Tnj* conditional on the estimation data by Hn*(z,Fn). Under regularity conditions, Hn*(z,Fn) has the following asymptotic expansion:
-1/2 -1 -1
H *(z,F ) = F(z) + n p (z,F ) + n p (z,F ) + o(n ) (2.13)
n n 1 n 2 n
uniformly over z almost surely. The leading terms of (2.11) and (2.13) are identical. Therefore, because p1 is even and p2(z,Fn) ® p2(z,F) almost surely, the distributions of ½Tnj½ and ½Tnj*½ are almost surely identical through O(n1). This is the source of the bootstrap's ability to provide improved critical values for Tnj and improved coverage probabilities for confidence intervals.
To obtain the improved critical value for the symmetrical t test, let tna* denote the a-level critical value for testing H0*. This can be computed with arbitrary accuracy by the following Monte Carlo procedure:
C1. Use the estimation data to compute qnj,OMD.
C2. Generate a bootstrap sample of size n by sampling the estimation data randomly with replacement. Compute Tnj* from this sample.
C3. Use the results of many repetitions of C2 to compute the EDF of Tnj*. Set tna* equal to the 1 - a quantile of this EDF.
It follows from (2.13) and p2(tna*,Fn) - p2(tna*,F) ® 0 almost surely that
-1 -1
2[1 - F(t *) - n p (t *,F)] = a + o(n ) (2.14)
na 2 na
almost surely. Combining (2.12) and (2.14), yields the result that
-1
t * = t + o(n ) (2.15)
na na
almost surely. Thus, the bootstrap and true finite-sample critical values of the symmetrical t test differ by o(n-1) almost surely. In contrast, tna - za/2 = O(n-1). Therefore, the bootstrap approximation to tna is more accurate than the approximation provided by asymptotic distribution theory.
Equation (2.15) can also be used to show that
-2
P(q - t *s £ q £ q + t *s ) = a + O(n ).
nj,OMD na na 0j nj,OMD na na
Thus the difference between the true and nominal coverage probabilities of a symmetrical confidence interval for q0j based on the bootstrap critical value is O(n-2) as n ® ¥. In contrast, the difference is O(n-1) when the asymptotic critical value is used. Therefore, the use of the bootstrap critical value reduces the error in the coverage probability.
The foregoing results depend critically on the fact that Tnj is an asymptotically pivotal statistic under H0. That is, its asymptotic distribution does not depend on unknown population parameters. The bootstrap reduces the error in the coverage probability of a symmetrical confidence interval only if the interval is based on an asymptotically pivotal statistic. If a statistic that is not asymptotically pivotal is used (qnj,OMD - q0j, for example), the error in the coverage probability with a bootstrap critical value is O(n-1), which is the same as the error with the asymptotic critical value.
3. MONTE CARLO EXPERIMENTS
This section reports the results of a Monte Carlo investigation of the numerical accuracy of the bootstrap as a means of reducing the bias of the OMD estimator and the differences between the true and nominal coverage probabilities of confidence intervals based on this estimator.
The theory outlined in Section 2 shows that the bootstrap can be used to improve the finite-sample performance of the OMD estimator. However, the bootstrap bias correction and critical value are only approximations to the exact, finite-sample bias and critical value. Although the bootstrap approximations are more accurate than those of asymptotic theory (in the sense that the errors of the bootstrap approximations converge to zero more rapidly as n ® ¥), nothing in the theory guarantees that the bootstrap will provide high numerical accuracy with a sample of fixed size. The Monte Carlo results provide information about numerical accuracy.
The data-generation processes used in the experiments are taken from the correlated-moments processes of Altonji and Segal (1994). These are used instead of the independent-moments processes of Altonji and Segal (1994, 1996) because qn,OMD is asymptotically more efficient than qn,EWMD with the correlated-moments processes but not the independent-moments ones. In each experiment, X has 10 components, and n = 500. The j'th component of X, X(j) (j = 1,...,10), is generated by X(j) = (Z(j) + rZ(j + 1))/(1 + r2)1/2, where Z(1),..., Z(11) are iid random variables with means of zero and variances of 1, and r = 0.5. The Z's are sampled from five different distributions, depending on the experiment. These are the uniform, normal, Student t with 10 degrees of freedom, exponential, and lognormal, all standardized to have means of zero and variances of one. It is assumed that r is known and that the components of X are known to be identically distributed and to follow MA(1) processes. Therefore, the estimation problem in the experiments is to infer the scalar parameter q that is identified by the moment conditions Var(X(j)) = q (j = 1,..., 10) and Cov(X(j), X(j - 1)) = rq/(1 + r2) (j = 2,..., 10). Accordingly, S is a vector containing the 10 sample variances and 9 sample covariances of X(1),..., X(10).
Experiments were carried out using both the EWMD and OMD estimators. Since the EWMD estimator is unbiased, the experiments with it consisted of computing its empirical RMSE and the empirical coverage probability of a nominal 95% confidence interval for q based on the asymptotic critical value of the t statistic. In the experiments with the OMD estimator, the empirical bias, RMSE, and coverage probability of a nominal 95% confidence interval based on the asymptotic critical value were computed. In addition, the empirical bias and RMSE of the OMD estimator following bootstrap bias reduction were computed as was the empirical coverage probability of a nominal 95% confidence interval based on the bootstrap critical value. The experiments with bootstrapping were carried out by applying steps B1-B3 and C1-C3 in Sections 2c and 2d at each Monte Carlo replication of the data-generation process. There were 1000 Monte Carlo replications per experiment. Bootstrap bias corrections and critical values were computed using 500 bootstrap resamples at each replication.
The results of the experiments are shown in Table 1. qn,OMD is biased and its RMSE exceeds that of qn,EWMD for all distributions of Z except the uniform. Moreover, the coverage probabilities of confidence intervals based on qn,OMD with asymptotic critical values are far below the nominal value of 0.95 except in the experiment with uniform Z's. In contrast, the coverage probabilities of confidence intervals based on qn,EWMD with asymptotic critical values are close to nominal in all of the experiments except the one with lognormal Z's. In the lognormal experiment, the error in coverage probability is much smaller with qn,EWMD than with qn,OMD. These results are consistent with those of Altonji and Segal (1994, 1996).
Table 1 also shows that bootstrap bias reduction greatly reduces both the bias and RMSE of qn,OMD. In addition, the use of bootstrap critical values greatly reduces the errors in the coverage probabilities of confidence intervals based on qn,OMD. In the experiments with normal, Student t, or uniform Z's, the bootstrap essentially eliminates the bias of qn,OMD and the errors in the coverage probabilities of the confidence intervals. Moreover, the RMSE of the bias-corrected qn,OMD in these experiments is 12-50% less than that of qn,EWMD. In the experiments with exponential or lognormal Z's, the bootstrap reduces but does not eliminate the bias of qn,OMD and the errors in the coverage probabilities of the confidence intervals. The bootstrap also reduces the RMSE of qn,OMD. However, with exponential or lognormal Z's, the RMSE and error in coverage probability are larger for qn,OMD with bootstrap bias reduction and bootstrap critical values than for qn,EWMD.
The relatively poor performance of the bootstrap in the experiments with exponential or lognormal Z's is due to the poor precision of Sn in (2.3) as an estimator of the covariance matrix of S and, therefore, of Wn as an estimator of the inverse covariance matrix, when Z is exponentially or lognormally distributed. This problem is illustrated by the coefficient of variation of e'Wne in the experiments. It is 0.066, 0.083, 0.10, 0.16, and 0.35 in the experiments with uniform, normal, Student t, exponential, and lognormal Z's, respectively. Section 4 describes an estimator of the covariance matrix of S that is more precise in finite samples. The numerical performance of the bootstrap in the experiments with exponential and lognormal Z's is greatly improved when this estimator is used in place of Sn.
4. AN IMPROVED COVARIANCE-MATRIX ESTIMATOR
a. Description of the Estimator
The exponential and lognormal distributions both have high kurtosis (6 and 111, respectively, compared to -1.2, 0 and 1 for the uniform, normal, and Student t distributions). In the experiments based on exponential and lognormal distributions, Sn is an imprecise estimator of the covariance matrix of S because it is strongly influenced by outlier values of X. Koenker, et al. (1994) provide a theoretical analysis of this problem. The problem can be alleviated by trimming observations of X to reduce the influence of outliers on Sn.
To describe the resulting covariance-matrix estimator, let {an} be a sequence of positive real numbers that increases to ¥ at a suitable rate as n ® ¥. The required rate of increase is discussed in Section 4b. Let Xi(j) denote the j'th component of the i'th observation of X, and let (j) be the sample average of the j'th component. That is, (j) = n-1Si1Xi(j). Define
(j) (j)
ì 1 if sup ½X - ½ £ a
ï i n
ï j
d (a ) = í
i n ï
ï
î 0 otherwise
and
n
= S d (a ).
i n
i=1
The trimmed covariance estimator is the version of Sn that is obtained by applying (2.3) to observations Xi for which di(an) = 1. Denote this estimator by n. n consists of terms of the form
n
-1 (r) (s) (t) (u)
S d (a )(X - ) (X - ) (X - ) (X - ) (4.1)
i n i i i i
i=1
n n
é -1 (r) (s)ùé -1 (t) (u)ù
- ê S d (a )(X - ) (X - ) úê S d (a )(X - ) (X - ) ú.
ë i n i i ûë i n i i û
i=1 i=1
Let n,OMD denote the OMD estimator that is obtained by using n instead of Sn to estimate the covariance matrix of S. n,OMD is asymptotically equivalent to qn,OMD but has better finite-sample properties. The arguments leading to this conclusion are outlined in the Appendix.
The usefulness of the trimmed covariance-matrix estimator is illustrated in Table 2, which shows the results of the experiments with exponential and lognormal Z's when an is selected by Monte Carlo to minimize the RMSE of the estimator. With exponential Z's, the estimator obtained using n,OMD, bootstrap bias reduction, and bootstrap critical values, is essentially unbiased and its RMSE is the same as that of qn,EWMD. The error in the coverage probability of a nominal 95% confidence interval is not statistically significantly different from zero. With lognormal Z's, the RMSE of the estimator based on n,OMD is 9% below that of qn,EWMD. The coverage probability of the OMD-based confidence interval is close to the nominal value and closer to nominal than the coverage probability of the EWMD-based interval. Thus, the bootstrap combined with the improved covariance-matrix estimator largely eliminates the poor finite-sample performance of the OMD estimator.
The values of an in Table 2 are in the range 2-2.5. Since X has a standard deviation of 1, only observations that are 2-2.5 standard deviations from the mean are trimmed in the computation of n. Nonetheless, trimming eliminates 21% of the observations from the computation of n when Z is exponentially distributed and 27% when Z is lognormally distributed. This reflects the relatively high frequency of outliers with these high-kurtosis distributions.
Of course, the Monte Carlo method used to choose an in the experiments is not available in applications. However, it is possible to construct a Monte Carlo procedure that mimics the one used in the experiments and is feasible in applications. This procedure is described in Section 4b.
b. Choosing an in Applications
This section describes an empirical method for selecting the trimming parameter an in n. The discussion is informal and the arguments heuristic.
The procedure consists of using Monte Carlo simulation to compute am for some m < n and then multiplying am by a scale factor to obtain an. Suppose for the moment that q is scalar. Then am is obtained by solving the problem
minimize ½B (a)½, (4.2)
m
a
where Bm(a) is obtained as follows:
A1. Use the full estimation data set to compute qn,EWMD.
A2. Generate m observations of X by sampling the estimation data randomly with replacement. Compute m,OMD using the m observations.
A3. Average the values of m,OMD that are obtained by repeating step A2 many times. Denote the resulting average by Em(m,OMD)
A4. Set Bm(a) = Em(m,OMD) - n,OMD.
To understand the rationale for this procedure, note that each sample obtained in step A2 is a random sample of size m from the true but unknown process that generated the estimation data. Therefore, for the determination of am, the only differences between steps A1-A4 and the infeasible Monte Carlo procedure used in Section 4a are that (1) A1-A4 chooses am to minimize the bias of m,OMD relative to qn,EWMD rather than relative to the true but unknown population value of q, and (b) Em is an average over finitely many, non-independent samples of size m. Accordingly, am reflects random sampling error in qn,EWMD and in Em. However, the sampling error in qn,EWMD becomes negligible as n ® ¥, and sampling error in Em becomes negligible as m ® ¥ in such a way that (m/n) ® 0. Thus, the procedure for selecting am is justified asymptotically. A similar argument would hold if qn,OMD were used instead of qn,EWMD. The random sampling error of the computed am would be larger, however, because qn,OMD has a larger finite-sample RMSE than does qn,EWMD.
If q is a vector, so is Bm(a), and (4.2) can be replaced by minimization of the distance of Bm(a) from zero in any desired metric.
It remains to determine how am should be scaled to yield an estimate of an. The Appendix presents an argument showing that the rate of increase of an as n ® ¥ is less than n1/4. Therefore,
1/4
a £ a < (n/m) a (4.3)
m n m
Choosing m requires balancing two considerations. If m is too large, the random sampling error in am is large, whereas if m is too small, the asymptotic approximations that justify (4.3) are inaccurate. I have carried out Monte Carlo experiments based on the exponential and lognormal designs of Section 3 with n = 1500 (approximately the sample size used in the empirical example in Section 5) and m = 500. The resulting average values of am were in the range 2.5-3 with standard deviations of 0.5-0.6. Thus, the values of am obtained from steps A1-A4 are close to those obtained with the infeasible Monte Carlo procedure of Section 4a. With n/m = 3, (4.3) yields am £ an < 1.32am. This range of uncertainty in an is acceptable in the empirical example of Section 5. There is little variation in the results as an varies over this range.
5. AN EMPIRICAL EXAMPLE
This section illustrates the bootstrap methods discussed in Sections 2-4 by using them to estimate the covariance structure of year-to-year changes in the logarithms of annual earnings and working hours of male heads of households in the Panel Study of Income Dynamics (PSID). The data are those used by Altonji and Segal (1994, 1996) and are similar to those of Abowd and Card (1987, 1989), who also used data from the PSID. The data are based on observations of annual earnings and hours worked by 1536 individuals over an 11-year period (1969-1979). The logarithms of earnings and hours have been adjusted for labor-market experience and year effects, and dollar values have been adjusted to 1967 levels using the Consumer Price Index. See Altonji and Segal (1994, 1996) for further details on the preparation of the data.
The data provide 10 observations per individual on year-to-year changes in the logarithms of earnings and hours. Accordingly, the matrix V in (2.1) contains 210 separate moments for the various years and lags. However, Abowd and Card (1987, 1989) found that the covariances of observations separated by more than two years are negligible. Accordingly, the estimates reported here are based on a vector S consisting of q = 98 variances and covariances with time lags of up to 2 years. The estimated model is stationary, so q consists of r = 11 parameters. These are the hours variance, the earnings variance, 2 hours autocovariances, 2 earnings autocovariances, and 5 hours/earnings covariances.
Point estimates and confidence intervals for q were obtained using EWMD and OMD with asymptotic critical values, OMD with bootstrap bias reduction and bootstrap critical values, and the modified OMD estimator of Section 4 with bootstrap bias reduction and bootstrap critical values. Five hundred bootstrap replications were used to compute bootstrap estimates of bias and critical values. The trimming parameter an required for the modified OMD estimator was computed using the procedure described in Section 4b with m = 500. This yielded am = 0.5, so 0.5 £ an < 0.66 by (4.3). The sample standard deviations of the changes in the logarithms of earnings and hours are in the range 0.28-0.47. Thus, in the computation of n, an observation is trimmed away if it is 1.06-2.36 standard deviations from the mean. The exact trimming threshold in standard deviation units depends on the year, whether the observation is of earnings or hours, and the value of an within the range 0.5-0.66.
The estimation results are shown in Table 3. The EWMD and unbootstrapped OMD estimates (columns 1 and 2) are similar to those of Altonji and Segal (1994, 1996) but not identical because Altonji and Segal used all available lags in estimation. As in Altonji and Segal (1994, 1996), the EWMD and OMD point estimates are very different. Moreover, the half-widths of the OMD confidence intervals are only about half those of the EWMD intervals, which suggests that the OMD estimates are much more precise than the EWMD estimates. In fact, the OMD parameter estimates and the half-widths of the OMD confidence intervals are misleading. In a Monte Carlo experiment in which the estimation data were sampled with replacement, Altonji and Segal found that the OMD estimator is badly biased and that the true coverage probabilities of confidence intervals based on asymptotic critical values are far below the nominal probabilities.
The results of OMD estimation with bootstrap bias reduction and critical values but not trimming are shown in column 3 of Table 3. The OMD parameter estimates with bias reduction are closer to the EWMD estimates than are the OMD estimates without bias reduction. However, the OMD estimates with bias reduction are closer to those without bias reduction than to the EWMD estimates. This suggests that the bootstrap has reduced but not eliminated the bias of the OMD estimates. This finding is not surprising. The sample kurtoses of the changes in the logarithms of earnings and hours are 17-47. The Monte Carlo results discussed in Section 3 indicate that the bootstrap, by itself, cannot eliminate bias in the presence of such high kurtosis. The half widths of the OMD-based confidence intervals are much larger with bootstrap critical values than with asymptotic ones. However, the high kurtosis of the estimation data and the bootstrap's inability to remove the bias of the estimates suggest that the coverage probabilities of these intervals may be below the nominal 0.95.
Columns 4 and 5 of Table 3 show the results of OMD estimation with the modified weight matrix and bootstrap bias reduction and critical values. Estimates are shown for values of an at the lower and upper end of the range given by (4.3). There is little difference between the estimates obtained with the two values of an. Moreover, both sets of values are close to the EWMD estimates. Most of the differences between the EWMD and modified OMD estimates are less than the half-widths of the confidence intervals of either. Moreover, most of the modified OMD estimates are much closer to the EWMD estimates than to the unmodified OMD estimates with or without bootstrap bias reduction. The half-widths of the confidence intervals based on the modified OMD estimator with bootstrap critical values are slightly larger than the half-widths based on the EWMD estimator. The estimation results do not reveal the reason for this. One possibility is that the finite-sample efficiency of OMD is less than that of EWMD with these data. Another possibility is that the EWMD confidence intervals with asymptotic critical values are too narrow. This interpretation is suggested by the Monte Carlo results of Section 3, in which the true coverage probability of the EWMD confidence interval is less than the nominal probability when the kurtosis of the population distribution is high. Altonji and Segal (1994, 1996) obtained a similar results in the experiments in which they resampled the estimation data randomly with replacement.
In summary, the estimation results suggest that with the PSID data, there is no great advantage to using OMD instead EWMD. The results do suggest, however, that using the modified OMD weight matrix together with bootstrap bias reduction and critical values overcomes the poor finite-sample performance of the standard OMD estimator. This finding is consistent with the Monte Carlo results reported in Section 4.
6. CONCLUSIONS
The OMD estimator of covariance structures is asymptotically efficient, but its finite-sample performance is poor. With samples of the sizes encountered in applications, it is often badly biased, its RMSE can be much larger than that of the asymptotically inefficient EWMD estimator, and there can be large differences between the true and nominal coverage probabilities of confidence intervals.
This paper has demonstrated the ability of the bootstrap to reduce the finite-sample bias and RMSE of the OMD estimator and to reduce the differences between the true and nominal coverage probabilities of confidence intervals based on the OMD estimator. In numerical experiments in which the sampled population had low kurtosis, the bootstrap essentially eliminated the finite-sample bias of the OMD estimator and the errors in the coverage probabilities of confidence intervals. Moreover, the RMSE of the OMD estimator after bootstrap bias reduction was below that of the EWMD estimator. Similar results were obtained in experiments with high-kurtosis populations when the estimator of the OMD weight matrix was modified to reduce the influence of outlier observations. Thus, in the data-generation processes used in the experiments, the bootstrap combined with the modified weight-matrix estimator overcomes the poor finite-sample performance of the OMD estimator.
These results are subject to the usual qualifications about attempting to generalize from numerical evidence. Certainly, there is no guarantee that the OMD estimator with bootstrapping and the modified weight-matrix estimator will always perform as well as they did in the experiments described here. There is also no guarantee that bootstrapping and weight-matrix modification will always produce an OMD estimator that is more efficient in finite samples than the EWMD estimator. The results presented here do show, however, that there are situations in which OMD estimation with bootstrapping is considerably more efficient in finite samples than EWMD estimation.
Altonji and Segal (1994, 1996) concluded that EWMD estimation is almost always preferable to standard OMD estimation with samples of the sizes found in applications. Nothing in this paper contradicts that conclusion. This paper has shown, however, that there are situations in which OMD estimation with bootstrapping may be preferable to EWMD.
APPENDIX
This appendix outlines the arguments leading to (4.3). To minimize the complexity of the discussion, it is assumed that the components of X are known by the analyst to be independently distributed with means of zero and common variance q. Only the scalar parameter q has to be estimated, and the OMD weight matrix is diagonal. The arguments for the general case are similar, but the notation and algebra are much more complex.
Under the foregoing assumptions, the j'th component of the q-vector S is
n
(j) -1 (j) 2
S = n S (X ) .
i
i=1
Wn is the q´q diagonal matrix whose (j,j) component is nj-1, where nj = bnj - gnj2,
n
-1 (j) 4
b = S d (a )(X ) ,
nj i n i
i=1
n
-1 (j) 2
g = S d (a )(X ) ,
nj i n i
i=1
di(an) = I(sup j ½Xi(j)½ £ an), and I(·) is the indicator function. The modified OMD estimator of q satisfies
q -1 q
é 2 -1ù 2 -1 (j)
q - q = ê S (b - g ) ú S (b - g ) (S - q ). (A1)
n,OMD 0 ë nj nj û nj nj 0
j=1 j=1
Assume that the components of X have moments through order 8. Suppose for the moment that an = ¥ for all n. Then, it follows from the theory of Edgeworth expansions (see, e.g., Hall, 1992) that the bias and variance of n1/2qn,OMD have asymptotic expansions of the forms b/n1/2 + O(n-3/2) and v0 + v1/n + O(n-2), respectively, where b, v0 and v1 are constants. Therefore, the mean-square error of n1/2qn,OMD has the expansion
2 -2
MSE = v + (v + b )/n + O(n ). (A2)
0 1
Making an a finite, increasing function of n adds an extra term to (A2). A Taylor series approximation to (A1) shows that the extra term has the form
q
D(a ) = S c b *,
n j nj
j=1
where cj (j = 1,...,q) is a constant and
(j) 4
b * = E{(X ) [1 - d (a )]}.
nj 1 1 n
Let fj(·) denote the probability density function of X(j), and suppose that fj(x) ~ ½x½-d as ½x½ ® ¥ for some d > 5. Then bnj* = O(an5 - d) as n ® ¥.
The procedure of Section 4 aims at selecting an to minimize the O(n-1) term in the asymptotic expansion (A2). This requires D(an) = O(n-1) and, therefore, bnj* = O(n-1), which happens if an ~ n1/(d - 5) as n ® ¥. Existence of the expansion (A2) requires the components of X to have moments through order 8, so d > 9. Accordingly, the rate of increase of an as n ® ¥ is slower than n1/4. If fj(x) decreases at an exponential or faster rate as ½x½ ® ¥, then a similar argument shows that an has a logarithmic rate of increase as n ® ¥. The rate of increase of an is still slower than n1/4.
TABLE 1: RESULTS OF MONTE CARLO EXPERIMENTS 1
EWMD Estimator OMD Estimator without Bootstrap OMD Estimator with Bootstrap
Coverage Prob. Coverage Prob. Coverage Prob.
of Nominal 95% of Nominal 95% of Nominal 95%
Conf. Interval Conf. Interval Conf. Interval
with Asymptotic with Asymptotic with Bootstrap
Distribution RMSE Critical Value Bias RMSE Critical Value Bias RMSE Critical Value
Uniform 0.019 0.96 0.005 0.015 0.93 0.002 0.014 0.96
Normal 0.024 0.96 0.016 0.025 0.85 0.0 0.021 0.95
Student t 0.029 0.94 0.024 0.034 0.79 0.002 0.026 0.95
Exponential 0.042 0.95 0.061 0.073 0.54 0.014 0.048 0.91
Lognormal 0.138 0.86 0.136 0.285 0.03 0.136 0.173 0.76
1 Based on 1000 replications. Empirical coverage probabilities between 0.94 and 0.96 are not statistically significantly different from the nominal probability at the 0.05 level.
TABLE 2: RESULTS OF MONTE CARLO EXPERIMENTS
WITH TRIMMED COVARIANCE ESTIMATOR1
OMD Estimator with Bootstrap
Coverage Prob.
of Nominal 95%
Conf. Interval
with Bootstrap
Distribution an /n Bias RMSE Critical Value
Exponential 2.5 0.78 0.004 0.042 0.96
Lognormal 2.0 0.73 0.046 0.126 0.91
1 Based on 1000 replications. Empirical coverage probabilities between 0.94 and 0.96 are not statistically significantly different from the nominal probability at the 0.05 level.
TABLE 3: ESTIMATES OF COVARIANCE STRUCTURE OF CHANGES
IN LOGARITHMS OF EARNINGS AND HOURS
Quantities in Parentheses are Half-Widths of
Nominal 95% Symmetrical Confidence Intervals
Modified OMD with
EWMD OMD OMD with Bootstrap Bias
with with Bootstrap Reduction and
Asymp. Asymp. Bias Crit. Vals.
Covariance Crit. Crit. Reduction and
Parameter Vals. Vals. Crit. Vals. an = 0.5 an = 0.66
(1) (2) (3) (4) (5)
E(t), E(t) 0.175 0.107 0.129 0.173 0.167
(0.020) (0.011) (0.050) (0.025) (0.024)
E(t), E(t-1) -0.060 -0.035 -0.043 -0.056 -0.057
(0.011) (0.005) (0.021) (0.012) (0.011)
E(t), E(t-2) -0.008 -0.009 -0.009 -0.010 -0.013
(0.005) (0.003) (0.006) (0.008) (0.008)
H(t), H(t) 0.131 0.072 0.088 0.118 0.114
(0.016) (0.008) (0.035) (0.018) (0.019)
H(t), H(t-1) -0.047 -0.027 -0.032 -0.039 -0.040
(0.008) (0.004) (0.014) (0.009) (0.009)
H(t), H(t-2) -0.006 -0.005 -0.006 -0.012 -0.010
(0.004) (0.002) (0.005) (0.007) (0.008)
E(t), H(t) 0.081 0.036 0.046 0.077 0.074
(0.014) (0.007) (0.029) (0.016) (0.015)
E(t), H(t-1) -0.024 -0.008 -0.011 -0.021 -0.021
(0.008) (0.003) (0.011) (0.005) (0.008)
E(t), H(t-2) -0.002 -0.004 -0.005 -0.001 -0.008
(0.002) (0.003) (0.005) (0.008) (0.008)
H(t), E(t-1) -0.026 -0.012 -0.015 -0.023 -0.020
(0.008) (0.004) (0.029) (0.010) (0.009)
H(t), E(t-2) -0.008 -0.006 -0.007 -0.015 -0.015
(0.004) (0.003) (0.006) (0.007) (0.008)
REFERENCES
Abowd, J.M. and D. Card (1987). Intertemporal Labor Supply and Long-Term Employment Contracts, American Economic Review, 77, 50-68.
Abowd, J.M. and D. Card (1989). On the Covariance Structure of Earnings and Hours Changes, Econometrica, 57, 411-445.
Altonji, J.G. and L.M. Segal (1994). Small Sample Bias in GMM Estimation of Covariance Structures, NBER Technical Working Paper no. 156, National Bureau of Economic Research, Cambridge, MA.
Altonji, J.G. and L.M. Segal (1996). Small Sample Bias in GMM Estimation of Covariance Structures, Journal of Business and Economic Statistics, 14, 353-366.
Behrman, J., M. Rosenzweig, and P. Taubman (1994). Endowments and the Allocation of Schooling in the Family and in the Marriage Market: The Twins Experiment, Journal of Political Economy, 102, 1131-1174.
Beran, R. (1988). Prepivoting Test Statistics: A Bootstrap View of Asymptotic Refinements, Journal of the American Statistical Association, 83, 687-697.
Beran, R. and G.R. Ducharme (1991). Asymptotic Theory for Bootstrap Methods in Statistics, Les Publications CRM, Centre de recherches mathematiques, Universite de Montreal, Montreal, Canada.
Griliches, Z. (1979). Sibling Models and Data in Economics: Beginnings of a Survey, Journal of Political Economy, 87, S37-S64.
Hall, P. (1986). On the Bootstrap and Confidence Intervals, Annals of Statistics, 14, 1431-1452.
Hall, P. (1992). The Bootstrap and Edgeworth Expansion. New York: Springer-Verlag.
Hall, R.E. and F. Mishkin (1982). The Sensitivity of Consumption to Transitive Income: Estimates from Panel Data on Households, Econometrica, 50, 461-481.
Koenker, R., J.A.F. Machado, C.L. Skeels, and A.H. Welsh (1994). Momentary Lapses: Moment Expansions and the Robustness of Minimum Distance Estimation, Econometric Theory, 10, 172-197.