by
Joel L. Horowitz
Department of Economics
University of Iowa
Iowa City, IA 52242
August 1996
ABSTRACT
The least-absolute-deviations (LAD) estimator for a median-regression model does not
satisfy the standard conditions for obtaining asymptotic refinements through use of the bootstrap
because the LAD objective function is not smooth. This paper overcomes this problem by
smoothing the objective function so that it becomes differentiable. The smoothed estimator is
asymptotically equivalent to the standard LAD estimator. With bootstrap critical values, the
levels of symmetrical t and P2 tests based on the smoothed estimator are correct through O(n-(),
where ( < 1 but can be arbitrarily close to 1. In contrast, first-order asymptotic approximations
make an error of size O(n-(). The bootstrap accounts for terms of size O(n-() in the asymptotic
expansions of the test statistics, whereas first-order approximations ignore these terms. These
results also hold for symmetrical t and P2 tests
for censored median regression models.
KEY WORDS: Asymptotic expansion, smoothing, L1 regression,
least absolute deviations
Research supported in part by NSF grant SBR-9307677. I thank Peter Bickel, Moshe Buchinsky,
Oliver Linton, Paul Ruud, and Gene Savin for helpful
comments and discussions.
BOOTSTRAP METHODS FOR MEDIAN REGRESSION MODELS
1. INTRODUCTION
A linear median regression model has the form
(1.1) Y = X$ + U,
where Y is an observed scalar dependent variable, X is a 1Hq vector of observed explanatory
variables, $ is a qH1 vector of constant parameters, and U is an unobserved random variable that
satisfies median(U*X=x) = 0 almost surely. The parameters $ may be estimated by the method of
least absolute deviations (LAD). Bassett and Koenker (1978) and Koenker and Bassett (1982)
give conditions under which the LAD estimator is n1/2-consistent and asymptotically normal.
Koenker and Bassett (1978) treat quantile regressions, which generalize (1.1) by specifying that a
quantile of the conditional distribution of U (not necessarily the median) is zero. Bloomfield and
Steiger (1983), Koenker (1982), and Koenker and Bassett (1978), among others, discuss the
robustness properties of the LAD estimator.
The asymptotic normality of the LAD estimator makes it possible to form asymptotic t and
P2 statistics for testing hypotheses about $ in (1.1). However, first-order asymptotic
approximations can be inaccurate with samples of the sizes encountered in applications. As a
result, the true and nominal levels of t and P2 tests and the true and nominal coverage probabilities
of confidence intervals for components of $ can be very different when critical values based on
first-order asymptotic approximations are used. Buchinsky (1995), de Angelis, et al. (1993),
Dielman and Pfaffenberger (1984, 1986, 1988), and Monte Carlo results that are presented later in
this paper provide numerical evidence on the accuracy of first-order approximations.
This paper shows that the bootstrap provides asymptotic refinements to the levels of t and
P2 tests of hypotheses about $ in (1.1). That is, as the sample size, n, increases, the differences
between the true and nominal levels of the tests converge to zero more rapidly with critical values
obtained from the bootstrap than with critical values obtained from first-order asymptotic theory.
It is well known that under suitable conditions the bootstrap provides asymptotic refinements to
the levels of tests and coverage probabilities of confidence intervals (see, e.g., Beran 1988; Hall
1986, 1992; Horowitz 1996). However, the standard theory of the bootstrap does not apply to t
and P2 statistics based on the LAD estimator. This theory is based on an Edgeworth expansion of
the distribution of the statistic of interest. The validity of the expansion is usually established by
using a Taylor series to approximate the statistic by a smooth function of sample moments that
satisfies conditions given, for example, by Bhattacharya and Ghosh (1978) for the existence of an
Edgeworth expansion. The LAD objective function is not smooth, however, and Taylor series
methods cannot be used to approximate the LAD estimator by a smooth function of sample
moments. Indeed, de Angelis, et al. (1993) have shown that the distribution of the LAD estimator
has a non-standard and very complicated asymptotic expansion.
This paper solves these problems by smoothing the LAD objective function to make it
differentiable. The resulting estimator will be called the smoothed LAD (SLAD) estimator. It is
first-order asymptotically equivalent to the standard LAD estimator but has much simpler higher-
order asymptotics. Use of the SLAD estimator greatly eases the task of obtaining asymptotic
refinements to levels of tests and, thereby, makes it possible to obtain results that go well beyond
those obtained in previous research.
Previous research by de Angelis, et al. (1993) has shown that when U is independent of X
and certain other conditions are satisfied, the error in the bootstrap approximation to the
cumulative distribution function (CDF) of the LAD estimator is o(n-2/5). Hahn (1995) showed
consistency of a bootstrap approximation to the CDF without assuming independence of U and X,
but he did not investigate the size of the approximation error. Neither de Angelis, et al. nor Hahn
investigated the bootstrap's ability to correct the levels of t and P2 tests based on the LAD
estimator.
Janas (1993) investigated the related but simpler problem of testing a hypothesis about a
population median (no covariates). He showed that when a suitable version of the bootstrap is
used to obtain the critical value, the difference between the true and nominal levels of a
symmetrical t test of a hypothesis about a population median is o(n-(), where ( < 1 but can be
arbitrarily close to 1 if the underlying population density is sufficiently smooth. By contrast, first-
order approximations make an error of size O(n-(). The bootstrap accounts for a term of size O(n-()
in the asymptotic expansion of the distribution of the test statistic, whereas first-order
approximations ignore this term.
This paper extends the results of previous research in three ways. First, it gives conditions
under which the bootstrap provides asymptotic refinements to the levels of t and P2 tests of
hypotheses about $ in (1.1). Second, in contrast to de Angelis, et al. (1993), it is not assumed that
U and X are independent. Any form of dependence is permitted as long as median(U*X=x) = 0
almost surely and mild regularity conditions are satisfied. Third, it is shown that the bootstrap
also provides asymptotic refinements for t and P2 tests of hypotheses about $ in the censored
median regression model of Powell (1984). Under the conditions that are given here, the
differences between the true and nominal levels of symmetrical t and P2 tests with bootstrap
critical values are o(n-() for a suitable ( satisfying 7/9 < ( < 1. By contrast, the differences between
the true and nominal levels are O(n-() with critical values based on first-order approximations. As
in Janas (1993), the bootstrap accounts for a term of size O(n-() in the asymptotic expansion of the
t or P2 statistic, whereas first-order approximations ignore this term. The value of ( depends on the
smoothness of the conditional density of U at zero and can be arbitrarily close to 1 if the density is
sufficiently smooth.
Although this paper treats explicitly only the levels of symmetrical t and P2 tests, it will be
clear that the results also apply to coverage probabilities of symmetrical confidence intervals and,
with suitable modifications, to equal-tailed and one-sided t tests and confidence intervals. In
addition, the methods used here can easily be extended to show that the bootstrap provides
asymptotic refinements for tests and confidence intervals based on smoothed versions of the
quantile-regression estimator of Koenker and Bassett (1978) and the censored quantile-regression
estimator of Powell (1986).
The remainder of the paper is organized as follows. Section 2 describes the smoothed LAD
estimator and gives its first-order asymptotic distribution. Section 3 describes the test statistics
and procedures that are used to obtain bootstrap critical values. Section 4 presents theorems
giving conditions under which the bootstrap provides asymptotic refinements to the levels of
symmetrical t and P2 tests. Section 4 also describes the extension to censored median regressions.
Section 5 presents the results of a small Monte Carlo investigation of the numerical performance
of the bootstrap, and Section 6 gives concluding comments. The proofs of theorems are in the
Appendix.
2. THE SMOOTHED LAD ESTIMATOR
This section describes the smoothed LAD estimator and establishes its asymptotic
equivalence to the standard LAD estimator.
Let {Yi,Xi: i = 1,...,n} be a random sample of (Y,X) in (1.1). The standard LAD estimator
solves
n
-1
minimize: (b) / n 3 *Y - X b*
n i i
b 0 B i=1
n
-1
(2.1) = n 3 (Y - X b)[2I(Y - X b > 0) -1],
i i i i
i=1
where B is the parameter set and I(C) is the indicator function. n(b) has cusps and, therefore, is not
differentiable at points b such that Yi - Xib = 0 for some i. The SLAD estimator smooths these
cusps by replacing the indicator function in n with a smooth function.
To do this, let K be a bounded, differentiable function satisfying K(v) = 0 if v # -1 and K(v)
= 1 if v $ 1. Additional requirements that K must satisfy are given in Section 4a. Let {hn} be a
sequence of positive real numbers (bandwidths) that converges to zero as n 6 4. The SLAD
estimator solves
n Y - X b
-1 ~ S i i ` Y
(2.2) minimize: H (b) / n 3 (Y - X b)s2KO O - 1.
n i i T < h '
i=1 n
K is analogous to the integral of a kernel function for nonparametric estimation. K is not a kernel
function itself.
It may appear that the presence of a smoothing parameter hn in (2.2) is a disadvantage of
SLAD relative to LAD, but this appearance is misleading. With median regression models,
smoothing and the introduction of smoothing parameters are unavoidable for obtaining
satisfactory performance of the bootstrap. Under assumptions stronger than those made here, de
Angelis, et al. (1993) found that the error in the bootstrap approximation to the distribution of the
LAD estimator converges to zero more slowly than the error made by first-order asymptotic
theory unless the bootstrap samples a smoothed version of the data. Janas (1993) smooths the data
to obtain bootstrap refinements for a test of a hypothesis about a population median. The
smoothing methods of de Angelis, et al. and Janas do not extend easily to models with
heteroskedasticity or censoring. In this paper, smoothing the objective function replaces
smoothing the data. The resulting SLAD estimator is useful because it enables asymptotic
refinements to coverage probabilities of confidence intervals and levels of tests to be obtained
easily. The SLAD estimator is not needed if the only objective is to obtain a point estimate of $.
Let n be a LAD estimator (a solution to (2.1)) and bn be a SLAD estimator (a solution to
(2.2)). Intuition suggests that n and bn are asymptotically equivalent if hn converges to zero
sufficiently rapidly. Theorem 2.1 below shows that this intuition is correct. Regularity conditions
for the theorem are given in Section 4a. They are stated in the form that is used to obtain this
paper's main objective, which is to show that the bootstrap provides asymptotic refinements for
tests based on the SLAD estimator. The regularity conditions are stronger than would be needed
if the only objective were to prove that n and bn are asymptotically equivalent.
Theorem 2.1: Under Assumptions 1-6 of Section 4a, n1/2(bn - n) = op(1).#
To state the asymptotic distribution of n1/2(bn - $), let f(C*x) denote the density of U in (1.1)
conditional on X = x. Assume that f(0*x) exists at U = 0 for almost all x. Define D =
2E[X'Xf(0*X)], and assume that D is nonsingular. It follows from Theorem (2.1) and asymptotic
normality of the LAD estimator (see, e.g., Buchinsky 1995) that n1/2(bn - $) 6 d N(0,V), where V =
D-1E(X'X)D-1. To obtain a consistent estimator of V, let K(1)(v) = dK(v)/dv. Define
n Y - X b
-1 (1)S i i `
(2.3) D (b) = 2(nh ) 3 X 'X K O O.
n n i i < h '
i=1 n
It is not difficult to show that Dn(bn) 6 p D under the conditions given in Section 4a. E(X'X) can be
estimated consistently by the sample average of X'X. However, for purposes of obtaining
asymptotic refinements, it is more convenient to use an estimator of the exact finite-sample
variance of the first derivative of Hn(b) at b = $. This estimator is Tn(bn), where
n Y - X b Y - X b Y - X b 2
-1 :~ S i i ` Y S i i ` (1)S i i `B
T (b) = n 3 X 'X ;s2KO O - 1 + 2O OK O OC .
n i i<T < h ' < h ' < h 'D
i=1 n n n
Under the conditions given in Section 4a, Tn(bn) 6 p E(X'X). It follows that V is estimated
consistently by Vn / Dn(bn)-1Tn(bn)Dn(bn)-1.
3. TESTING A HYPOTHESIS ABOUT $
a. The Symmetrical t and Chi-Square Tests
Let bni and $i, respectively, be the i'th components of bn and $ (i = 1,...,q). Let Vni be the (i,i)
component of Vn. The t statistic for testing the hypothesis H0: $i = $0i is t / n1/2(bni - $0i)/Vni1/2. If H0
is true, then t 6 d N(0,1). The symmetrical t test rejects H0 at the asymptotic " level if *t* > z"/2,
where z"/2, the asymptotic critical value, is the 1 - "/2 quantile of the standard normal distribution.
Now let R be an RHq matrix with R # q, and let c be an RH1 vector of constants. Consider
a test of the hypothesis H0: R$ = c. Assume that the matrix RD-1E(X'X)D-1R' is nonsingular. Then
under H0, the statistic
2 -1
P / n(Rb - c)'(RV R') (Rb - c)
n n n
is asymptotically chi-square distributed with R degrees of freedom. H0 is rejected at the
asymptotic " level if P2 exceeds the asymptotic critical value consisting of the 1 - " quantile of the
chi-square distribution.
Section 4 gives conditions under which the bootstrap provides asymptotic refinements to
critical values and levels of the symmetrical t and P2 tests.
b. The Bootstrap Procedure
The bootstrap estimates the distribution of a test statistic by treating the estimation data as if
they were the population. Thus, the bootstrap distribution of a statistic is the distribution induced
by sampling the estimation data randomly with replacement. The "-level bootstrap critical value
of the symmetrical t test is the 1 - " quantile of the bootstrap distribution of *t*. The "-level
bootstrap critical value of a test based on P2 is the 1 - " quantile of the bootstrap distribution of P2.
The bootstrap distributions of *t* and P2 can be estimated with arbitrary accuracy by Monte
Carlo simulation. To specify the Monte Carlo procedure, let the bootstrap sample be denoted by
{Yi*,Xi*: i = 1,...,n}. Define the following bootstrap analogs of Hn(b), Dn(b) and Tn(b):
n Y * - X *b
-1 ~ S i i ` Y
H *(b) / n 3 (Y * - X *b)s2KO O - 1,
n i i T < h '
i=1 n
n Y * - X *b
-1 (1)S i i `
D *(b) = (nh ) 3 X *'X *K O O,
n n i i < h '
i=1 n
and
T *(b) =
n
n Y * - X *b Y * - X *b Y * - X *b 2
-1 :~ S i i ` Y S i i ` (1)S i i `B
n 3 X *'X *;s2KO O - 1 + 2O OK O OC .
i i <T < h ' < h ' < h 'D
i=1 n n n
Let bn* be a solution to (2.2) with Hn replaced by Hn*. Let Vni* be the (i,i) component of the
matrix Dn*(bn*)-1Tn*(bn*)Dn*(bn*)-1.
The Monte Carlo procedure for estimating the bootstrap critical value of the symmetrical t
test is as follows. The procedure for estimating the bootstrap critical value of P2 is similar.
1. Generate a bootstrap sample {Yi*,Xi*: i = 1,...,n} by sampling the estimation data
randomly with replacement.
2. Using the bootstrap sample, compute the bootstrap t statistic for testing the
hypothesis H0*: $i = bni, where bn solves (2.2). The bootstrap t statistic is t* / n1/2(bni* -
bni)/(Vni*)1/2, where bni* is the i'th component of bn*.
3. Estimate the bootstrap distribution of *t** by the empirical distribution that is
obtained by repeating steps 1 and 2 many times. The bootstrap critical value of the symmetrical t
test is estimated by the 1 - " quantile of this empirical distribution.
Because the bootstrap critical value can be estimated with arbitrary accuracy by repeating
steps 1 and 2 sufficiently many times, the results presented in Section 4 pertain to the true
bootstrap critical value, not its Monte Carlo estimator.
4. MAIN RESULTS
This section presents theorems giving conditions under which the bootstrap provides
asymptotic refinements to the levels of symmetrical t and P2 tests based on the SLAD estimator.
As in other applications (see, e.g., Beran 1988, Hall 1992), the proof that the bootstrap provides
asymptotic refinements is based on showing that the distributions of the test statistics and their
bootstrap analogs have asymptotic expansions that are identical to sufficiently high order. The
main technical problem that must be solved is establishing conditions under which these
expansions exist. This is done in Theorems 4.1 and 4.2. Once the existence of the expansions is
established, it is a relatively easy matter to show that the use of bootstrap critical values provides
asymptotic refinements to the levels of symmetrical t and P2 tests. This is done in Theorem 4.3.
a.Assumptions
This subsection presents the assumptions under which it is proved that the bootstrap
provides asymptotic refinements for symmetrical t and P2 tests based on the SLAD estimator. Let
r $ 4 be an even integer. Let K(i)(v) = diK(v)/dvi. The assumptions are:
1. {Yi,Xi: i = 1,...,n} is a random sample of (Y,X), where Y = X$ + U, X is a 1Hq vector of
observed random variables, U is an unobserved random scalar, and $ is a qH1 constant vector.
2. $ is an interior point of B, which is a compact subset of Uq.
3. The support of the distribution of X is bounded, and E(X'X) is positive definite.
4. Let F(C*x) and f(C*x), respectively, denote the CDF and density of U conditional on X
= x. (a) F(0*x) = 0.50 for almost every x. (b) For all u in a neighborhood of 0 and almost every
x, f(u*x) exists, is bounded away from zero, and is r - 1 times continuously differentiable with
respect to u.
5. (a) K(C) is bounded, K(v) = 0 if v # -1, and K(v) = 1 if v $ 1. (b) K is 4-times
differentiable everywhere, K(1)(v) is symmetrical about v = 0, and K(i) (i = 1,...,4) is bounded and
Lipschitz continuous on (-4,4). (c) Let (v) be a vector whose components are [2K(v) - 1] and its
derivatives through order 3, vK(1)(v) and its derivatives through order 3, and [2K(v) - 1 +
2vK(1)(v)]2 and its first derivative. For any 2 0 U10 satisfying 222 = 1, there is a partition of [-1,1],
-1 = a1 < a2 <... < aL(2) = 1 such that 2'(v) is either strictly increasing or strictly decreasing on (aR-1,
aR) (R = 2,...,L(2)). (d) For each integer i (1 # i # r),
1 0 if i < r
! i (1)
# v K (v)dv =
"
-1 C (nonzero) if i = r
K
6. hn % n-6, where 2/(2r + 1) < 6 < 1/3.
Assumptions 1-5b define the model and insure that $ is identified, n1/2(bn - $) is
asymptotically normal, and the Taylor series expansions used to obtain higher-order asymptotic
approximations to t and P2 exist. The assumption that X has bounded support is not essential and
can be dropped at the expense of more complex proofs. Assumption 5c is used to establish a
modified form of the Cramer condition of Edgeworth analysis (lemma 9 of the Appendix).
Assumption 5d, which requires K(1) to be a "higher-order" kernel, and Assumption 6 insure that
the (first-order) asymptotic distribution of n1/2(bn - $) has mean zero and that Taylor series
remainder terms are negligibly small. Functions K satisfying assumption 5 can be constructed by
integrating kernels given by Mu..ller (1984).
b. Theorems
This section gives theorems that establish conditions under which the bootstrap provides
asymptotic refinements for symmetrical t and P2 tests based on the SLAD estimator. Theorems
4.1 and 4.2 give conditions under which the sample and bootstrap versions of *t* and P2 have
Edgeworth-type asymptotic expansions. Theorem 4.3 shows that the bootstrap provides
asymptotic refinements under the same conditions.
The following additional notation is used. Let M and N, respectively, denote the standard
normal distribution and density functions. Let Pn* denote the bootstrap probability measure. This
measure places mass 1/n at each data point (Yi,Xi). The cumulants of t through order 4 can be
approximated with an accuracy of O[(nhn)-1] by using Taylor-series expansions that are described
in the Appendix. Denote the approximate cumulants by the vector <n. The first four cumulants of
t* conditional on the estimation sample can also be approximated with an accuracy of O[(nhn)-1]
almost surely. Let <n* be the vector containing the approximate bootstrap cumulants. Define d =
dim(<n) = dim(<n*).
The following theorem establishes the existence of Edgeworth-type expansions of the
distributions of *t* and *t**.
Theorem 4.1: Let assumptions 1-6 hold. Let < be an arbitrary vector with dimension d.
There is a function q(J,<) such that: (a) q(C,<) is a polynomial; (b) q(J,<n) and q(J,<n*) consist of
terms whose sizes are O[(nhn)-1] (almost surely in the case of q(J,<n*));
(c)
-1
(4.1) P(*t* # J) = 2M(J) - 1 + q(J,< )N(J) + o[(nh ) ]
n n
uniformly over J, and
(d)
-1
P *(*t** # J) = 2M(J) - 1 + q(J,< *)N(J) + o[(nh ) ]
n n n
uniformly over J almost surely. #
The coefficents of J in q are functions of the approximate cumulants of t and t*. These, in
turn, are functions of asymptotic forms of moments of products of derivatives of Hn($), Dn($), and
Tn($) with respect to the components of $. Because the number of such moments is very large,
obtaining an analytic expression for q is not feasible. It is possible, however, to calculate the rates
at which the moments converge to zero, and this is sufficient to prove the theorem.
The proof of Theorem 4.1 takes place in two main steps. The first step consists of showing
that t and t* can be approximated up to asymptotically negligible remainder terms by functionals
of derivatives of Hn($), Dn($), and Tn($) (or their bootstrap analogs in the case of t*). This is done
in Propositions 1 and 2 of the Appendix. The second step is to show that the distributions of the
approximations to t and t* have asymptotic expansions through order (nhn)-1. This step is carried
out using methods similar to those used to prove Theorems 5.5 and 5.6 of Hall (1992).
Now consider the P2 test. Let P2* be the bootstrap version of the P2 statistic. The first two
moments of P2 and P2* can be approximated through O[(nhn)-1]. Let <nP and <nP* denote the
vectors of approximate moments. Let FP,R denote the chi-square distribution function with R
degrees of freedom. The following theorem, which is a modified version of Theorem 1b of
Chandra and Ghosh (1979), gives conditions under which the distributions of P2 and P2* have
Edgeworth expansions through O[(nhn)-1].
Theorem 4.2: Let assumptions 1-6 hold. Let < be an arbitrary 2H1 vector. There is a
function qP(J,<) such that q(J,<nP) and q(J,<nP*) consist of terms whose sizes are O[(nhn)-1] (almost
surely in the case of q(J,<nP*)),
z
2 ! -1
(4.2) P(P < z) = # d{[1 + q (>,< )]F (>)} + o[(nh ) ]
" P nP P,R n
-4
uniformly over z, and
z
2 ! -1
P *(P * < z) = # d{[1 + q (>,< *)]F (>)} + o[(nh ) ]
n " P nP P,R n
-4
uniformly over z almost surely. #
The final theorem shows that the use of bootstrap critical values yields asymptotic
refinements to the levels of symmetrical t and P2 tests. Let t"* denote the "-level critical value of
the bootstrap symmetrical t test. That is, t"* is the 1 - " quantile of the bootstrap distribution of
*t**. Let c"* denote "-level critical value of the bootstrap P2 test. That is, c"* is the 1 - " quantile
of the bootstrap distribution of P2*.
Theorem 4.3: Let assumptions 1-6 hold. Under H0: $i = $0i,
a.P(*t* > t"*) = " + o[(nhn)-1].
If RD-1E(X'X)D-1R' is nonsingular, then under H0: R$ = c,
b.P(P2 > c"*) = " + o[(nhn)-1]. #
First-order asymptotic approximations drop the terms qN and qPFP,r in (4.1) and (4.2). The
resulting approximation errors are O[(nhn)-1].
c. Censored Median Regressions
This section describes the extension of the foregoing results to the censored median
regression model of Powell (1984). The model is
(4.3) Y = max(0, X$ + U),
where X, $, and U are as defined in (1.1). The censored LAD (CLAD) estimator of $, cn, solves
n
-1
minimize: n 3 *Y - max(0,X b)*,
i i
b 0 B i=1
where B is the parameter set. Equivalently, cn solves
minimize: (b) /
cn
b 0 B
n
-1
n 3 {(Y - X b)[2I(Y - X b > 0) -1] - Y }I(X b > 0),
i i i i i i
i=1
Under regularity conditions, n1/2(cn - $) 6 d N(0,Vc), where Vc = Dc-1TcDc-1, Dc = 2E[X'Xf(0*X)I(X$
> 0)], and Tc = E[X'XI(X$ > 0)] (Powell 1984).
Like the objective function of the LAD estimator, cn has cusps. The smoothed CLAD
estimator (SCLAD) removes them by replacing the indicator functions in cn with smooth
functions. The SCLAD estimator, bcn, solves
n
-1
minimize: H (b) / n 3 g (Y ,X ,h ,b),
cn c i i n
b 0 B i=1
where
: ~ Sy - xb` Y B Sxb `
g (y,x,h,b) = ;(y - xb)s2KO O - 1 - yCKO - 2O,
c < T < h ' D < h '
and K and hn are as in (2.2). The smoothed version of I(xb > 0) is K(xb/h -2) instead of K(xb/h)
for technical reasons relating to prevention of asymptotic bias. Under conditions stated below,
n1/2(bcn - cn) = op(1) as n 6 4.
To form t and P2 statistics based on bcn it is necessary to have consistent estimators of Dc
and Tc. Define
n Y - X b
-1 (1)S i i `
D (b) = (nh ) 3 X 'X K O OI(Y > 0).
cn n i i < h ' i
i=1 n
It is not difficult to show that Dcn(bcn) 6 p Dc. Tc can be estimated consistently by the sample
average of X'XI(Xbcn > 0) (Powell 1984). As in SLAD estimation, however, for purposes of
obtaining asymptotic refinements it is more convenient to use an estimator of the exact finite-
sample variance of the first derivative of Hcn(b) at b = $. This estimator is Tcn(bcn), where
n
-1
T (b) = n 3 [Mg (Y ,X ,h ,b)/Mb][Mg (Y ,X ,h ,b)/Mb]'
cn c i i n c i i n
i=1
Vc is estimated consistently by Vcn / Dcn(bcn)-1Tcn(bn)Dcn(bcn)-1.
The formulae for t and P2 statistics for testing hypotheses about $ in (4.3) are the same as in
Section 3a but with Vn replaced by Vcn. The procedure for obtaining bootstrap critical values for
these statistics is the same as in Section 3b but with Dn, Tn, Dn*, and Tn* replaced with Dcn, Tcn,
and their bootstrap analogs.
To establish the ability of the bootstrap to provide asymptotic refinements for t and P2 tests
based on the SCLAD estimator, it is necessary to modify Assumptions 1 and 3 as follows:
1'. {Yi,Xi: i = 1,...,n} is a random sample of (Y,X), where Y = max(0, X$ + U), X is a 1Hq
vector of observed random variables, U is an unobserved random scalar, and $ is a qH1 constant
vector.
3'. The support of the distribution of X is bounded, P(X$ = 0) = 0, and E[(X'X)I(Xb > ,)] is
positive definite for some , > 0 and all b in a neighborhood of $.
The following theorem shows that the SCLAD and CLAD estimators are asymptotically
equivalent and that the bootstrap provides asymptotic refinements to the levels of symmetrical t
and P2 tests based on the SCLAD estimator.
Theorem 4.4: Let assumptions 1', 2, 3', and 4-6 hold. Then
a. n1/2(bcn - cn) = op(1) as n 6 4.
Let t"* and c"*, respectively, denote the "-level bootstrap critical values of the SCLAD
symmetrical t and P2 tests. Under H0: $i = $0i,
b.P(*t* > t"*) = " + o[(nhn)-1].
If RDc-1TcDc-1R' is nonsingular, then under H0: R$ = c,
c.P(P2 > c"*) = " + o[(nhn)-1]. #
5. MONTE CARLO EXPERIMENTS
This section describes the results of a small Monte Carlo investigation of the finite-sample
level of the SLAD t test with bootstrap critical values. The numbers of experiments and
replications per experiment are small because of the very long computing times they entail, even
on a fast computer.
Each experiment evaluates the level of a symmetrical t test using asymptotic or bootstrap
critical values. The hypothesis being tested is H0: $1 = 1 in the model Y = $0 + $1X + U, where $0
and $1 are scalar parameters whose true values are ($0,$1) = (1,1) (so H0 is true), and X ~ U[1,5].
There are 3 different distributions of U. In the first experiment, U ~ N(0,2). In the second, U ~
Student t with 3 degrees of freedom scaled to have a variance of 2. In the third experiment, U =
0.25(1 + X)V, where V ~ N(0,1). Thus, U is heteroskedastic. The smoothing function K is
:0 if v < -1
=
=
= 3 5 7
K(v) = ;0.5 + (105/64)[v - (5/3)v + (7/5)v - (3/7)v ] if *v* # 1
=
=
=
<1 if v > 1
K is the integral of a 4th-order kernel for nonparametric density estimation (Mu..ller 1984).
The experiments with the SLAD estimator consisted of computing the empirical level of the
nominal 0.05-level symmetrical t test of H0 with bootstrap critical values. To provide a basis for
evaluating the performance of the bootstrap, experiments were also carried out with the
unsmoothed LAD estimator. These consisted of computing the empirical level of the nominal
0.05-level symmetrical t test of H0 with the asymptotic critical value. The LAD estimator was
studentized by using the consistent variance estimator Dn(n)-1En[(1,X)'(1,X)]Dn(n)-1, where n is the
LAD estimator of ($0,$1), Dn is as in (2.3), and En(C) is the sample average. Dn for the LAD
estimator was computed using the 2nd-order kernel K2(v) = (15/16)(1 - v2)2I(*v* # 1).
Computation of the SLAD and LAD t statistics require choosing the value of a bandwidth
parameter for each. Existing theory provides little guidance on how this should be done in finite
samples, so experiments were carried out using a range of bandwidth values.
The experiments used a sample size of n = 50 and were carried out with a program written
in GAUSS with GAUSS pseudo-random number generators. There were 500 Monte Carlo
replications per experiment with the SLAD estimator and 1000 with the LAD estimator. There
were fewer replications in the SLAD experiments because of the long computing times required
for Monte Carlo simulations with bootstrapping. Each experiment consisted of repeating the
following steps 500 or 1000 times:
A. Generate an estimation data set of size n = 50 by randomly sampling (Y,X) from the model
under consideration. Obtain the SLAD or LAD estimate of ($0,$1), and compute the t statistic for
testing H0: $1 = 1. Call its value tS if it is based on the SLAD estimator and tL if it is based on the
LAD estimator.
B. In experiments with tS, compute the bootstrap critical value by following steps 1-3 in
Section 3b. Bootstrap samples were obtained by sampling the estimation data generated in step A
randomly with replacement. Denote the 0.05-level bootstrap critical value of the SLAD
symmetrical t test by t0.05*. t0.05* was computed from 100 bootstrap samples.
C. Reject H0 at the nominal 0.05 level based on tS if *tS* > t0.05*. Reject H0 at the
nominal 0.05 level based on tL if *tL* > 1.96, the asymptotic critical value.
The results of the experiments are summarized in Figures 1-3, which show the empirical
levels of the SLAD t test with bootstrap critical values and the LAD t test with the asymptotic
critical value as functions of the bandwidth. In the experiments, the empirical and nominal levels
of the LAD test can be made equal by choosing the bandwidth appropriately. The empirical level
is very sensitive to the bandwidth, however, and it is an open question whether the "optimal"
bandwidth can be estimated precisely in applications. In contrast, the empirical level of the SLAD
test with bootstrap critical values is close to the nominal level over a wide range of bandwidths.
Thus, use of the SLAD test with bootstrap critical values greatly decreases the importance of
precisely estimating an "optimal" bandwidth. Obtaining precise bandwidth estimates is difficult
even in relatively simple settings such as nonparametric density estimation, so the SLAD test's
relative insensitivity to the bandwidth is an important practical advantage of this test.
6. CONCLUSIONS
This paper has shown how the bootstrap can be used to obtain asymptotic refinements for
tests of hypotheses about the parameters of uncensored and censored linear median regression
models with or without heteroskedasticity of unknown form. The method is based on smoothing
the objective function of the relevant estimator. This approach contrasts with previous research on
bootstrap methods for median regressions, which has achieved less general results under more
restrictive assumptions by smoothing the data instead of the estimator. This paper has not
addressed the problem of how to choose the bandwidth parameter required for smoothing. It is
likely that this can also be done with the bootstrap, but the technical details are sufficiently
complex and lengthy to require treatment in a separate
paper.
APPENDIX
This Appendix provides proofs of the theorems stated in the text. It is assumed unless
otherwise stated that assumptions 1-6 hold. Define Ui = Yi - Xi$ and
n Y - X b
-1 : ~ S i i ` Y B
G (b) / n 3 ;(Y - X b)s2KO O - 1 - *U *C
n < i i T < h ' i D
i=1 n
The SLAD estimator minimizes both Hn(b) and Gn(b) over b 0 B. Gn is used for the proofs
because it is a sum of bounded terms.
Let 2C2 denote the Euclidean norm. Let X(j) denote the j'th component of X. For b 0 B,
define G(b) = E[*Y - Xb* - *U*] and
n
-1
(b) = n 3 (*Y - X b* - *U *).
n i i i
i=1
a. Step 1: Approximating t and t*
Lemma 1:
-1/2
sup *G (b) - G(b)* # o(n log n) + 2h
n n
b 0 B
almost surely.
Proof: It follows from Lemma 22 of Nolan and Pollard (1987) and Theorem 2.37 of Pollard
(1984) that *n(b) - G(b)* = o(n-1/2log n) almost surely uniformly over b 0 B. Also,
n Y - X b
-1 ~ S i i ` Y
G (b) - (b) = 2n 3 (Y - X b)sKO O - I(Y - X b > 0).
n n i i T < h ' i i
i=1 n
The summand differs from zero only if *Yi - Xib* # hn. Therefore,
n
-1
*G (b) - (b)* # 2n 3 *Y - X b*I(*Y - X b* # h ) # 2h .
n n i i i i n n
i=1
The lemma now follows from the triangle inequality. Q.E.D.
Lemma 2: Given any r > 0, 2bn - $2 # r almost surely for all sufficiently large n.
Proof: Let Nr = {b 0 B: 2b - $2 > r}. By assumptions 3 and 4, $ uniquely minimizes G(b)
over B. Therefore, G(b) > G($) + * for all b 0 Nr and some * > 0. By Lemma 1 and hn 6 0, there
is a finite n0 such that Gn(b) > Gn($) + */2 > Gn($) almost surely for all b 0 Nr if n > n0. But Gn(bn)
# Gn($). Therefore, bn Nr almost surely if n > n0. Q.E.D.
For i,j,k,R,m = 1,...,q, define Gni(b) = MGn(b)/Mbi, Gnij(b) = M2Gn(b)/MbiMbj, Gnijk(b) =
M3Gn(b)/MbiMbjMbk, and GnijkR(b) = M4Gn(b)/MbiMbjMbkMbR. Also, define Dni(b) =
MDn(b)/Mbi, Dnij(b) = M2Dn(b)/MbiMbj, and Tni(b) = MTn(b)/Mbi.
Lemma 3: For all i,j,k,R = 1,...,q, the following relations hold almost surely as n 6 4:
(a) sup b 0 B *Gni(b) - EGni(b)* = o[(log n)/n1/2]
(b) sup b 0 B *Gnij(b) - EGnij(b)* = o[(log n)/(nhn)1/2]
(c) sup b 0 B *Gnijk(b) - EGnijk(b)* = o[(log n)/(nhn3)1/2]
(d) sup b 0 B *GnijkR(b) - EGnijkR(b)* = o[(log n)/(nhn5)1/2]
(e) sup b 0 B *Dn(b) - EDn(b)* = o[(log n)/(nhn)1/2]
(f) sup b 0 B *Dni(b) - EDni(b)* = o[(log n)/(nhn3)1/2]
(g) sup b 0 B *Dnij(b) - EDnij(b)* = o[(log n)/(nhn5)1/2]
(h) sup b 0 B *Tn(b) - ETn(b)* = o[(log n)/n1/2]
(i) sup b 0 B *Tni(b) - ETni(b)* = o[(log n)/(nhn)1/2],
where (e)-(i) apply to the individual components of the matrices Dn, Dni, Dnij, Tn, and Tni. In
addition, for all i,j,k,R = 1,..., q
(j) EGni($) = 2[(1 - r)/r!]CKhnrE[X(i)f(r - 1)(0*X)] + o(hnr)
(k)
n
1/2 -1/2 (j) 1/2 r 1/2
n G ($) = -n 3 X [2I(U > 0) - 1] + O (n h + h )
nj i i p n n
i=1
(l) EGnij($) = 2E[X(i)X(j)f(0*X)] + O(hnr)
(m) EGnijk(b), EGnijkR(b), EDn(b), EDni(b), EDnij(b), ETn(b), and ETni(b), and are O(1) as n 6
4 for all b in a neighborhood of $.
Proof: Parts (a)-(i) are proved by using Lemmas 2.14 of Pakes and Pollard (1989) and 22 of
Nolan and Pollard (1987) to show that the summands of the relevant G, D and T functions form
Euclidean classes and then applying Theorem 2.37 of Pollard (1984). To prove (j), write Gnj($) =
Gnj(1) + Gnj(2), where
n
(1) -1 (j)
G = -n 3 X [2K(U /h ) -1],
nj i i n
i=1
n
(2) -1 (j) (1)
G = -2n 3 (U /h )X K (U /h ),
nj i n i i n
i=1
and Ui = Yi - Xi$. Then
4
(1) ! (j)
EG = -# x [2K(u/h ) -1]f(u*x)dudP(x).
nj " n
-4
Since 2K(u/hn) - 1 = "1 unless *u/hn* < 1, a change of variables gives
(1) ! (j)
(A1) EG = - # x [1 - F(h *x) - F(-h *x)]dP(x)
n1 " n n
1
! (j)
- h # x [2K(.) - 1]f(h .*x)d.dP(x).
n" n
-1
Integration by parts yields
1 1
! k ~ ! k + 1 (1) Y
# . [2K(.) - 1]d. = [2/(k + 1)]s1 - # . K (.)d.*
" T " k
-1 -1
/ [2/(k + 1)](1 - c )* ,
k k
for each k = 0, ..., r - 1, where *k = 0 if k is even and 1 if k is odd, and ck = 0 unless k = r - 1.
Therefore, Taylor series expansions of the integrands in (A1) about hn = 0 yield
r-1
(1) k k (j) (k)
EG = h 3 [(h ) - (-h ) ]/(k + 1)!]E[X f (0*X)]
nj n n n
k=0
r-1
k (j) (k) r
- 2h 3 [(1 - c )* /(k + 1)!]h E[X f (0*X)] + o(h )
n k k n n
k=0
-1 r (j) (r - 1) r
(A2) = 2(r!) C h E[X f (0*X)] + o(h ).
K n n
In addition,
4
(2) ! (j) (1)
EG = -2# X (u/h )K (u/h )f(u*x)dudP(x)
nj " n n
-4
1
! (j) (1)
= -2h # x .K (.)f(h .*x)d.dP(x).
n" n
-1
A Taylor series expansion of the integrand about hn = 0 yields
r-1 1
(2) ! k - 1 (1) k (j) (k) r
EG = -2h 3 # . K (.)d.(h /k!)E[X f (0*X)] + o(h )
nj n " n n
k=0 -1
-1 r (j) (r - 1) r
(A3) = -2C [(r - 1)!] h E[X f (0*X)] + o(h ).
K n n
Part (j) follows by combining (A2) and (A3).
To prove (k), observe that
n
1/2 (1) -1/2 (j)
n G = -n 3 X [2I(U > 0) - 1]
nj i i
i=1
n
-1/2 (j)
- 2n 3 X [2K(U /h ) - I(U > 0)].
i i n i
i=1
The variance of the second term is O(hn), and methods similar to those used to prove (k) show that
its mean is O(n1/2hnr). Similarly, En1/2Gnj(2) = O(n1/2hnr), and Var(n1/2Gnj(2)) = O(hn). Part (k) now
follows from Chebyshev's inequality.
To prove (l), write Gnjk($) = Gnjk(1) + Gnjk(2), where
n
(1) -1 (j) (k) (1)
G = 4(nh ) 3 X X K (U /h )
njk n i i i n
i=1
and
n
(2) -1 (j) (k) (2)
G = 2(nh ) 3 X X (U /h )K (U /h )
njk n i i i n i n
i=1
Arguments similar to those applied to EGnj(1) yield
(1) (j) (k) r
(A4) EG = 4E[X X f(0*X)] + O(h ).
njk n
Similarly,
4
(2) -1! (j) (k) (2)
EG = 2h # x x (u/h )K (u/h )f(u*x)dudP(x)
nj n " n n
-4
r 1
! i + 1 (2) i (i) r
= 2 3 # . K (.)(h /i!)f (0*x)d.dP(x) + o(h )
" n n
i=0 -1
by a change of variables and a Taylor series expansion. Integration by parts shows that
-1 if i = 0
1
! i + 1 (2)
(A5) # . K (.)d. = 0 if 1 # i < r
"
-1
-(r + 1)C if i = r
K
Therefore
(2) (j) (k) r
(A6) EG = -2E[X X f(0*X)] + O(h ).
njk n
Part (l) follows by combining (A4) and (A6).
To prove (m), consider EGnjkR(b). Let )b = b - $. Write GnjkR(b) = GnjkR(1)(b) + GnjkR(2)(b),
where
n U - X )b
(1) 2 -1 (j) (k) (R) (2)~ i i Y
G (b) = -6(nh ) 3 X X X K s
njkR n i i i T h
i=1 n
and
n U - X )b U - X )b
(2) 2 -1 (j) (k) (R) i i (2)~ i i Y
G (b) = -2(nh ) 3 X X X K s .
njkR n i i i h T h
i=1 n n
Now
4
(1) -2 : (j) (k) (R)! (2)~u - X)bY B
EG (b) = -6h E;X X X # K s f(u*X)duC
njkR n < " T h D
-4 n
A change of variables, a Taylor series expansion, and (A5) yield
1
(1) : (j) (k) (R)! (2) (1) B
EG (b) = - 6E;X X X # .K (.)f (. + X)b*X)d.C,
njkR < " D
-1
for between 0 and hn, which is bounded uniformly over )b in a neighborhood of 0 by assumption
4. Similar arguments apply to EGnjkR(2)(b) and the remaining G, D, and T functions. Q.E.D.
Define SnG to be a vector containing the unique components of Gni($), Gnij($), Gnijk($), and
GnijkR($) (i,j,k,R = 1,...,q). Order the components of SnG so that the first q are the Gni($).
Lemma 4: Let SG = plim n 6 4 SnG. There is a function 7$(SnG) taking values in Uq such that
7$(SG) = 0 and
3/2
(b - $) = 7 (S ) + o[1/(n h )]
n $ nG n
almost surely as n 6 4.
Proof: Define *n = bn - $ and *ni = bni - $i (i = 1,...,q). Let GnC($) be the vector whose
components are the unique components of Gni($) (i = 1,...,q). For fixed j, k, and R, define GnCj($),
GnCjk($), and GnCjkR($), respectively, to be the q-dimensional vectors whose components are Gnij($),
Gnijk($), and GnijkR($) (i = 1,...,q). Let Qn be the matrix whose (i,j) element is Gnij($). By Lemma 2,
bn satisfies the first-order condition GnC(bn) = 0 almost surely for all sufficiently large n. By
assumptions 3-4 and Lemma 3, Qn($) has an inverse almost surely for all sufficiently large n.
Therefore, a Taylor series expansion of GnC(bn) = 0 about bn = $ yields
-1
(A7) (b - $) = -Q [G ($) + (1/2)G ($)* *
n n nC nCjk nj nk
+ (1/6)G ($)* * * + R ],
nCjkR nj nk nR n
almost surely for all sufficiently large n, where the summation convention is used,
R = (1/6)[G ( ) - G ($)]* * * ,
n nCjkR n nCjkR nj nk nR
and n is between bn and $. By using arguments similar to those used to prove Lemma 3(m), it may
be shown that E[GnCjkR(b) - GnCjkR($)] = O(b - $) for b in a neighborhood of $. This result and
Lemma 3(d) imply that
5 1/2 3
2R 2 # {o[(log n)/(nh ) ] + O(2b - $2}2b - $2
n n n n
almost surely. Given any < > 0 and c > 0, suppose that 2*n2 < cn-1/2 + <. Then it follows from
Lemma 3 that the right-hand side of (A7) is less than cn-1/2 + < almost surely for all sufficiently large
n. In addition, Lemma 3b and assumptions 3-4 imply that the consistent solution to GnC(b) = 0 is
almost surely unique for all sufficiently large n. Therefore, application of the Brouwer fixed point
theorem to the right-hand side of (A7) shows that for any c > 0, < > 0,
-1/2 + <
(A8) 2b - $2 # cn
n
almost surely for all sufficiently large n. Application of the implicit function theorem to (A7)
shows that there is almost surely a differentiable function 7$ such that 7$(SG) = 0 and
(A9) (b - $) = 7 (S + ),
n $ nG n
where n is a vector such that dim(n) = dim(SnG), Rn forms the first q components of n, and the
remaining components of n are 0. Application of the mean value theorem to (A9) combined with
(A8) shows that
4 5 -1/2 3<
(A10) (b - $) = 7 (S ) + O[(log n)(n h ) n
n $ nG n
almost surely for any < > 0. The lemma now follows from assumption 6 by making < sufficiently
small. Q.E.D.
Proof of Theorem 2.1: It follows from Lemma 3 that Qn 6 D almost surely. Therefore, by
(A7), (A8) and a further application of Lemma 3,
1/2 -1
n (b - $) = D G ($) + o (1)
n nC p
n
-1 -1/2 (j)
(A11) = D n 3 X [2I(U > 0) - 1] + o (1).
i i p
i=1
The theorem follows by observing that (A11) is the Bahadur representation of the LAD estimator.
Q.E.D.
Let Sn denote the vector consisting of the unique components of SnG, Dn($), Dni($), Dnij($),
Tn($), and Tni($).
Lemma 5: For each i = 1,...,q, there is a real-valued function 7Vi(Sn) such that
1/2
V = 7 (S ) + . ,
ni Vi n n
where .n = o[(nhn)-1] almost surely.
Proof: Expand Dn(bn) and Tn(bn) in Taylor series about bn = $ through orders 2bn - $22 and
2bn - $2, respectively, and use (A10) to obtain
1/2 -1
(A12) V = (S , S + T ) + o[(nh ) ]
ni Vi n nG n n
almost surely for a suitable differentiable function Vi, where Tn = o[(nhn)-1]. The lemma follows by
applying the mean value theorem to (A12). Q.E.D.
Proposition 1: Define 7(Sn) = 7$(SnG)/7Vi(Sn). Then
1/2
lim sup (nh ){P(t # z) - P[n 7(S ) # z]} = 0.
n n
n 6 4 z
Proof: By Lemmas 4 and 5
1/2
n 7 (S ) + ,
$ nG n
(A13) t = ,
7 (S ) + <
Vi n n
where ,n and <n are o[(nhn)-1] almost surely. Define )n = t - n1/27(Sn). A Taylor series
approximation applied to (A13) yields )n = o[(nhn)-1] almost surely. Choose the sequence {Tn}
such that Tn = o[(nhn)-1] and )n/Tn = o(1) almost surely. Then
1/2 1/2
P[n 7(S ) # z - T ] - P[n 7(S ) # z] - P(2) 2 > T )
n n n n n
1/2
# P(t # z] - P[n 7(S ) # z]
n
1/2 1/2
# P[n 7(S ) # z + T ] - P[n 7(S ) # z] + P(2) 2 > T )
n n n n n
for every z. Therefore, since )n = o[(nhn)-1] and )n/Tn = o(1) almost surely,
1/2 -1
(A14) P(t # z) - P[n 7(S ) # z] = o[(nh ) ].
n n
uniformly over z. The proposition follows by multiplying both sides of (A14) by nhn and taking
the limit as n 6 4. Q.E.D.
Let En denote the expectation with respect to Pn*. Define Gn*(b) by replacing (Yi,Xi) with
(Yi*,Xi*) in the definition of Gn(b).
Lemma 6: For any b 0 B, define Ub = Y - Xb and
n
-1 d d
W (b) = n 3 [(U */h ) g(X *)f(U */h ) - E (U /h ) g(X)f(U /h )],
n bi n i bi n n b n b n
i=1
where g is bounded for bounded values of its argument, d = 0 or 1, and f is a bounded, Lipschitz
continuous function of bounded variation with support [-1,1]. (a) Define >n = [(hn/n)log n]1/2.
There is a finite C0 > 0 such that for all C > C0 and any ( $ 0
(
lim (nh ) P *( sup *W (b)* > C> ) = 0
n n n n
n 6 4 b 0 B
almost surely (P).
(b) Define >n = [(log n)/n]1/2. There is a finite C0 > 0 such that for all C > C0 and any ( $
0
(
lim (nh ) P *( sup *G *(b) - E G *(b)* > C> ) = 0
n n ni n ni n
n 6 4 b 0 B
and
(
lim (nh ) P *( sup *T *(b) - E T *(b)* > C> ) = 0
n n n n n n
n 6 4 b 0 B
almost surely (P).
(c) For any ( $ 0 and 0 > 0,
(
lim (nh ) P *( sup *G *(b) - G (b)* > 0) = 0
n n n n
n 6 4 b 0 B
Proof: Only part (a) is proved. The proofs of parts (b) and (c) are similar. Partition B into
subsets {Bj: j = 1,...,J} such that 2b1 - b22 < >n2 whenever b1 and b2 are in the same subset. For
each j = 1,...,J, let bj be a point in Bj. Observe that J = O(>n-2q). Then
J
P *( sup *W (b)* > C> ) = P *( c sup *W (b)* > C> )
n n n n n n
b 0 B j=1 b 0 B
j
J
(A15) # 3 P *( sup *W (b)* > C> ).
n n n
j=1 b 0 B
j
Because g is bounded, X has bounded support, and f is bounded and Lipschitz continuous, there is
an M < 4 such that
sup *W (b)* # 2M(log n)/n + *W (b )*
n n j
b 0 B
j
Therefore, for all sufficiently large n
(A16) P *( sup *W (b)* > C> ) # P *(*W (b )* > C> /2)
n n n n n j n
b 0 B
j
By using Lemma 22 of Nolan and Pollard (1987) and Theorem 2.37 of Pollard (1984), it can be
shown that En[nWn(bj)2] # c1hn almost surely (P) for some c1 < 4 and all sufficiently large n.
Therefore, by Bernstein's inequality
-Cd
(A17) P *(*W (b )* > C> /2) # 2exp(-Cdlog n) = 2n
n n j n
for some finite d > 0 and all sufficiently large n. Combining (A15)-(A17) yields
( ( -Cd -2q
(nh ) P *( sup *W (b)* > C> ) # 2(nh ) n O(> ) = o(1)
n n n n n n
b 0 B
as n 6 4 for all sufficiently large C. Q.E.D.
The following lemma gives the bootstrap version of Lemma 2.
Lemma 7: For any ( > 0 and , > 0
(
lim (nh ) P *(2b * - b 2 > ,) = 0.
n n n n
n 6 4
almost surely (P).
Proof: Given any 0 > 0, suppose that *Gn*(b) - Gn(b)* # 0 and *Gn(b) - G(b)* # 0 for all b 0
B. Then since bn* minimizes Gn*, Gn(bn) + 0 $ Gn*(bn) $ Gn*(bn*). Also, Gn*(bn*) $ Gn(bn*) - 0,
so Gn(bn) + 0 $ Gn*(bn*) $ Gn(bn*) - 0, and Gn(bn) - Gn(bn*) $ -20. By a similar argument, G($) -
G(bn) $ -20. Therefore, G($) - G(bn*) = [G($) - G(bn)] + [G(bn) - Gn(bn)] + [Gn(bn) - Gn(bn*)] +
[Gn(bn*) - G(bn*)] $ -60. Because G(b) is continuous on B with a unique minimum at $, it is
possible to choose 0 such that G($) - G(bn*) $ -60 implies 2bn* - $2 # ,/2. By Lemma 2 and the
triangle inequality, 2bn* - $2 # ,/2 implies that 2bn* - bn2 # , for all sufficiently large n almost
surely. Therefore, *Gn*(b) - Gn(b)* # 0 and *Gn(b) - G(b)* # 0 for all b 0 B imply that 2bn* - bn2 #
, for all sufficiently large n almost surely. The lemma follows by combining this result with
Lemmas 1 and 6(c). Q.E.D.
For i,j,k,R = 1,...,q, define Gni*(b) = MGn*(b)/Mbi, Gnij*(b) = M2Gn*(b)/MbiMbj, Gnijk*(b) =
M3Gn*(b)/MbiMbjMbk, GnijkR*(b) = M4Gn*(b)/MbiMbjMbkMbR, Dni*(b) = MDn*(b)/Mbi, Dnij*(b) =
M2Dn*(b)/MbiMbj, and Tni*(b) = MTn*(b)/Mbi. The bootstrap version of Lemma 3 is:
Lemma 8: For all i,j,k,R = 1,..., q, any ( > 0, and all sufficiently large C > 0, lim n 6
4(nhn)(Pn*(An) = 0 almost surely (P), where An is any of:
(a) sup b 0 B *Gni*(b) - EnGni*(b)* > C[(log n)/n1/2]
(b) sup b 0 B *Gnij*(b) - EnGnij*(b)* > C[(log n)/(nhn)1/2]
(c) sup b 0 B *Gnijk*(b) - EnGnijk*(b)* > C[(log n)/(nhn3)1/2]
(d) sup b 0 B *GnijkR*(b) - EnGnijkR*(b)* > C[(log n)/(nhn5)1/2]
(e) sup b 0 B *Dn*(b) - EnDn*(b)* > C[(log n)/(nhn)1/2]
(f) sup b 0 B *Dni*(b) - EnDni*(b)* > C[(log n)/(nhn3)1/2]
(g) sup b 0 B *Dnij*(b) - EnDnij*(b)* > C[(log n)/(nhn5)1/2]
(h) sup b 0 B *Tn*(b) - EnTn*(b)* > C[(log n)/n1/2]
(i) sup b 0 B *Tni(b) - ETni(b)* = o[(log n)/(nhn)1/2],
and (e)-(i) apply to the individual components of the matrices Dn*, Dni*, Dnij*, Tn*, and Tni*. In
addition, for all i,j,k,R = 1,..., q
(j) EnGni*(bn) = 0 with probability 1 - o[(nhn)-(].
(k) EnGnij*(b), EnGnijk*(b), EnGnijkR*(b), EnDn*(b), EnDni*(b), EnDnij*(b), EnTn*(b), and
EnTni*(b) are O(1) almost surely (P) as n 6 4 for all b in a neighborhood of $.
Proof: Parts (a)-(i) are immediate consequences of Lemma 6. Part (j) is the first-order
condition for the bootstrap estimation problem. Part (k) follows from Lemma 3. Q.E.D.
Define SnG* and Sn* as SnG and Sn except with (Yi,Xi) replaced by (Yi*,Xi*) and $ replaced
by bn.
Proposition 2: Let 7 be the function defined in Proposition 1.
1/2
lim sup (nh ){P *(t* # z) - P *[n 7(S *) # z]} = 0
n n n n
n 6 4 z
almost surely (P).
Proof: This is the bootstrap version of Proposition 1. It is proved using the same arguments
that are used to prove Lemmas 4-5 and Proposition 1 but with SnG, Sn, bn, and $, respectively,
replaced by SnG*, Sn*, bn*, and bn. Q.E.D.
b. Step 2: Asymptotic Expansions
For h > 0, let W(u,x,h) be a vector whose components are terms of the form g(x)j(u/h),
where g(x) is the product of (not necessarily distinct) components of x that may be different in
each use of g, and j is the j'th component of the vector defined in assumption 5. The following
lemma gives a modified version of the Cramer condition of Edgeworth analysis.
Lemma 9: Let J be a vector with the same dimension as W. Define RW(J,h) =
E{exp[4J'W(X,U,h)]} where 4 = (-1)1/2. For any , > 0, some C > 0, all J satisfying 2J2 > ,, and all
sufficiently small h
*R (J,h)* < 1 - Ch.
W
Proof: Let r index components of and W. Each component of satisfies *r(v)* = 0 or 1 if
*v* $ 1. Let *r- = r(v) if v # -1 and *r+ = r(v) if v $ 1. Then using the summation convention
!
R (J,h) = # exp[4J g (x) (u/h)]f(u*x)dudP(x)
W " r r r
-h 4
!:! - ! +
= #;# exp[4J g (x)* ]f(u*x)du + # exp[4J g (x)* ]f(u*x)du
"<" r r r " r r r
-4 h
h
! B
+ # exp[4J g (x) (u/h)]f(u*x)duCdP(x)
" r r r D
-h
= A (h) + A (h),
1 2
where
- +
A (h) = E{F(-h*X)exp[4J g (X)* ] + [1 - F(h*X)]exp[4J g (X)* ]},
1 r r r r r r
and
h
!:! B
A (h) = #;# exp[4J g (x) (u/h)]f(u*x)duCdP(x).
2 "<" r r r D
-h
Consider A1(h). *A1(h)* # E*A1(h,X)*, where
- +
A (h,x) = F(-h*X)exp[4J g (X)* ] + [1 - F(h*X)]exp[4J g (X)* ].
1 r r r r r r
Let r = *r+ if *r+ = -*r- = 1. Note that *r+ = *r- otherwise. Therefore,
*A (h,x)* = *F(-h*X)exp[4J g (X) ] + [1 - F(h*X)]exp[-4J g (X) ]*.
1 r r r r r r
2
= {[1 - F(h*x) + F(-h*x)]
2 1/2
- 4[1 - F(h*x]F(-h*x)sin [J g (x) ]}
r r r
# 1 - F(h*x) + F(-h*x)
2 (1) (1)
= 1 - 2hf(0*x) - (1/2)h [f (h *x) - f (h *x)],
1 2
where h1 and h2 are between 0 and h, and the last line is obtained by a Taylor series expansion.
Let Ef(0*X) = C1. By assumption 4(b), C1 > 0 and E*f(1)(h1*X) - f(1)(h2*X)* < M for some finite
M and all sufficiently small h. Therefore,
*A (h)* # E*A (h,X)* # 1 - C h
1 1 1
for all sufficiently small h. Now consider A2(h). By a change of variables
1
!:! B
A (h) = h#;# exp[4J g (x) (.)]f(h.*x)d.CdP(x).
2 "<" r r r D
-1
Given , > 0, choose h sufficiently small that
1 1
! !
# *f(h.*x) - f(0*x)*d.dP(x) # ,# f(0*x)d.dP(x) = 2,C .
" " 1
-1 -1
Then
(A18) *R (J,h)* # 1 - hC (1 - 2,) + *A (J,h)*
W 1 3
for all J, , > 0, and sufficiently small h > 0, where
1
:! B
A (J,h) = h;# exp[4J g (x) (.)]f(0*x)d.CdP(x).
3 <" r r r D
-1
Since gr(x) = 0 for every r only if x = 0 and P(X = 0) < 1, there are 0 > 0 and (1 < 1 such that
!
2# f(0*x)dP(x) = ( C .
" 1 1
2x2 < 0
Suppose, as will be proved presently, that for some C2 < 1,
1
!
(A19) sup * # exp[4J g (x) (.)]d.* = C
" r r r 2
2J2 $ , -1
uniformly over x such that 2x2 $ 0. Then for 2J2 $ ,
(A20) *A (J,h)* # h[( C + (1 - ( )C C ] = h( C ,
3 1 1 1 1 2 2 1
where (2 = [(1 + (1 - (1)C2] < 1. Combining (A18) with (A20) yields
sup *R (J,h)* # 1 - hC (1 - 2, - ( ) / 1 - Ch
W 1 2
2J2 > ,
for all sufficiently small h > 0 and , > 0, thereby establishing the lemma.
It remains to prove (A19). To do this, define t = 2J2. Fix J/2J2 and x with 2x2 . 0. For
the specified J/2J2 and x, and using the summation convention, define f(.) = Jrgr(x)r(.)/2J2. Let -1
= aR <...< aL = 1 be a partition of [-1,1] that satisfies assumption 5c when 2r = gr(x). Then
a
1 L L
! !
R*(J) / # exp[4tf(.)]d. = 3 # exp[4tf(.)]d.
" "
-1 R=2 a
R-1
It suffices to prove that for any , > 0 and some C3 < 1 that does not depend on x or J/2J2
a
R
-1*! *
(A21) sup (a - a ) s# exp[4tf(.)]d.* # C .
R R -1 *" * 3
*t* > , a
R-1
To do this, make the change of variables > = f(.) in (A21) and set v(>) = 1/{df[.(>)]/d.}. Then
a f(a )
R R
! ! 4t>
R**(t) / # exp[4tf(.)]d. = # e v(>)d>.
" "
a f(a )
R-1 R-1
Observe that *R**(t)* # aR - aR - 1, so the right-hand integral is bounded. The right-hand integral
can be approximated arbitrarily accurately by replacing v(C) with a step function. Therefore, it is
enough to prove that
"
2
* ! 4t> *
sup * # e d>* # (" - " )C
* " * 2 1 3
*t* > , "
1
for all "1 < "2 and some C3 < 1 that does not depend on "1 or "2. But
" 2
2 sin [0.5t(" - " )]
* ! 4t> * 2 1
* # e d>* # (" - " ) .
* " * 2 1 2
" [0.5t(" - " )]
1 2 1
The proof is completed by setting C3 = inf *t* > , [(sin2t)/t]. Q.E.D.
Define W* as in Lemma 9 except with $ replaced by bn. Define RW*(J,hn) =
En{exp[4J'W*(U,X,hn)]}. The bootstrap version of Lemma 9 is:
Lemma 10: For any , > 0 and c > 0, some C* > 0, all J satisfying , < 2J2 # nc, and all
sufficiently large n
*R *(J,h )* < 1 - C*h
W n n
almost surely (P).
Proof: Let BnJ = {J: , < 2J2 # nc}. Then
sup *R *(J,h )* # sup *R (J,h )*
W n W n
2J2 0 B 2J2 0 B
nJ nJ
+ sup *R *(J,h ) - R (J,h )*.
W n W n
2J2 0 B
nJ
By arguments similar to those used to prove Lemma 6 together with the Borel-Cantelli lemma,
*RW*(J,hn) - RW(J,hn)* = o(hn) almost surely uniformly over J 0 BnJ. Let C be as in Lemma 9.
Then Lemma 10 follows by letting C* be any number such that C < C* < 1. Q.E.D.
Let Wn1 be a column-vector consisting of the unique components of n1/2[Gni($) - EGni($)] (i
= 1,..., q) and n1/2[Tn($) - ETn($)]. Let Wn2 be a column-vector consisting of the unique
components of (nhn)1/2[Gnij($) - EGnij($)], (nhn3)1/2[Gnijk($) - EGnijk($], (nhn5)1/2[GnijkR($) - EGnijkR($)],
(nhn)1/2[Dn($) - EDn($)], (nhn3)1/2[Dni($) - EDni($)], (nhn5)1/2[Dnij($) - EDnij($)], and (nhn)1/2[Tni($) -
ETni($)] (i,j,k,R = 1,..., q). Set Wn = [Wn1', Wn2']'. Define Wn*, Wn1*, and Wn2* similarly except
with (Yi,Xi) replaced by (Yi*,Xi*) and $ replaced by bn. Order the components of Sn and Sn*
conformably with those of Wn and Wn*. Let Vn be the covariance matrix of [Wn1', Wn2'/hn]' and
Vn* be the covariance matrix of [Wn1*', Wn2*'/hn]' relative to Pn*. Let wn1, wn2, wn1*, and wn2*,
respectively, be the summands of the components of Wn1, Wn2, Wn1*, and Wn2*. These have the
forms gj(X)j(U/h) and gj(X)j(Un/h), where Un = Y - Xbn. For any J = (J1', J2')' conformable with
(wn1', wn2')', define
3
(A22) D (J) = -[4/(6h )]E[(J 'w ) ],
1 n 2 n2
3 2
(A23) D (J) = -(4/6)){E[(J 'w ) ] + (3/h )E[(J 'w )(J 'w ) ]},
2 1 n1 n 1 n1 2 n2
3
(A24) D (J) = -[4/(2h )]E[(J 'w ) ],
3 n 2 n2
and
4 -1 3 2
(A25) D (J) = [1/(24h )]E[(J 'w ) ] + (1/72){h E[(J 'w ) ]} .
4 n 2 n2 n 2 n2
Define Di*(J) (i = 1,..., 4) by replacing wn with wn* and E with En in (A22)-(A25). Let Bi (i =
1,...,4) be the signed measures whose Fourier-Stieljes transforms are
!
(A26) # exp(iJ'>)dB (>) = exp(-0.5J'V J)D (J).
" i n i
Define Bi* (i = 1,...,4) analogously by using Vn* and Di* in place of Vn and Di. Let dW = dim(Wn).
For any set " in dW-dimensional Euclidean space, let M" denote the boundary of " and (M"),
denote the set of all points whose distance from M" does not exceed ,. Let MVn denote probability
measure according to the normal distribution with mean 0 and covariance matrix Vn. Define MVn*
analogously.
Lemma 11: Let A denote a class of Borel sets in dW-dimensional Euclidean space that
satisfy
! 2
sup # exp(-0.52>2 )d> = O(,)
"
" 0 A ,
(M")
as , 6 0+. Then
-1/2 -1/2
sup *P(W 0 ") - M (") - (nh ) B (") - n B (")
n V n 1 2
" 0 A n
1/2 -1 -1
- (h /n) B (") - (nh ) B (")* = o[(nh ) ],
n 3 n 4 n
and almost surely (P)
-1/2 -1/2
sup *P *(W * 0 ") - M (") - (nh ) B *(") - n B *(")
n n V * n 1 2
" 0 A n
1/2 -1 -1
- (h /n) B *(") - (nh ) B *(")* = o[(nh ) ],
n 3 n 4 n
Proof: This is a slightly modified version of Theorem 5.8 of Hall (1992) and is proved
using the same arguments as in Hall's proof after replacing Hall's Lemma 5.6 with Lemmas 9 and
10 above. Q.E.D.
Proof of Theorem 4.1: Only parts (a), (c) and the part of (b) pertaining to q(J,<n) are proved
here. The proofs of the remaining parts are similar. To begin, invert (A26) to obtain
(A27) B (>) = (>)N (>),
i ni V
n
where for each n and i, ni(C) is a multivariate polynomial, and NVn is the multivariate normal
density with mean 0 and covariance matrix Vn. Let Sn(Wn) be the mapping from Wn to Sn. Define
(Wn) = (nhn)1/27[Sn(Wn)]. By Proposition 1 it suffices to consider P( # J). Define a1 = (nhn)-1/2, a2 =
n-1/2, a3 = (hn/n)1/2, and a4 = (nhn)-1. By Lemma 11 and (A27)
4
!
P( # J) = # d[M (>) + 3 a (>)N (>)]
" V i ni V
n i=1 n
{>: (>) # J}
-1
(A28) + o[(nh ) ].
n
uniformly over J. Order the components of > and Wn so that the first components correspond with
(nhn)1/2[Gni($) - EGni($)], where i is the component of $ for which t is the t statistic. Let denote the
vector consisting of all components of > except the first, >1. Change variables in the integral of
(A28) so that the variable of integration is (,')', thereby obtaining
! !
P( # J) = # d# dJ[> (,),]{M [> (,),]
" " 1 V 1
# J n
4
-1
(A29) + 3 a [> (,),]N [> (,),]} + o[(nh ) ]
i ni 1 V 1 n
i=1 n
uniformly over J, where J(C) is the inverse Jacobian term associated with the change of variables.
Taylor series expansions in powers of n-1 of the terms involving >1(,) in (A29) yield
5
-1
P( # J) = M(J) + 3 c (J)N(J) + o[(nh ) ]
ni i n
i=1
-1
(A30) / G (J) + o[(nh ) ]
n n
uniformly over J, where M and N, respectively, are the univariate standard normal distribution and
density functions, the i's are polynomial functions of one variable, cn1 = n-1/2, cn2 = (nhn)-1/2, cn3 = hnn-
1/2, cn4 = (nhn3/2)-1, and cn5 = (nhn)-1. Let R and RG, respectively, denote the characteristic functions
of the distributions of and Gn. Then *R(J) - RG(J)* = o[(nhn)-1]. A Taylor series expansion shows
that in (A30) can be replaced by a multivariate polynomial in components of Sn - E(Sn). The
cumulants through order 4 of this polynomial may be approximated through O[(nhn)-1] using
standard Taylor series methods of kernel estimation. Let knj denote the approximate j'th cumulant.
Expressing R in terms of the approximate cumulants yields R(J) = (J) + o[(nhn)-1] uniformly over
J, where
2 2 3
(J) = [exp(-J /2)]{1 + iJk + (1/2)(iJ) (k - 1) + (1/6)(iJ) k
n1 n2 n3
4 3 2
(A31) + (1/24)(iJ) k + (1/2)[(iJ)k + (1/6)(iJ) k ] }.
n4 n1 n3
Setting RG = , taking the inverse Fourier transform of the result, and setting P(** # J) = P( # J) - P(
# -J) yields (4.1) with
2 2
q(< ,J) = -J[k + (k - 1) + (1/12)(4k k + k )(J - 3)
n n1 n2 n1 n3 n4
2 4 2
+ (1/36)k (J - 10J + 15)].
n3
A straightforward but lengthy calculation shows that kn12, kn1kn3, and kn32 are o[(nhn)-1], whereas kn2
- 1 and kn4 are O[(nhn)-1] and consist of linear combinations of the terms shown in Table I. Q.E.D.
Proof of Theorem 4.2: Under H0, c = R$, so
2 -1
P = (nh )(b - $)'R'(RV R') R(b - $)
n n n n
By arguments similar to those used to prove Propositions 1 and 2 followed by a Taylor series
expansion, there is a multivariate polynomial 7P such that
2 -1
P(P # z) - P[(nh )7 (S ) # z] = o[(nh ) ].
n P n n
uniformly over z and
2
lim sup (nh ){P *(P * # z) - P *[(nh )7 (S *) # z]} = 0
n n n n P n
n 6 4 z
almost surely (P). Set (Wn) = (nhn)7P[Sn(Wn)]. By arguments similar to those used to obtain
(A28),
4
!
P( # J) = # d[M (>) + 3 a (>)N (>)]
" V i ni V
n i=1 n
{>: (>) # z}
-1
+ o[(nh ) ].
n
Now transform to polar coordinates and proceed as in the proof of Theorem 1b of Chandra and
Ghosh (1979). A similar argument applies to P(P2* < z). Q.E.D.
Proof of Theorem 4.3: Only part (a) is proved here. The proof of part (b) is similar. Let t"
and t"*, respectively, denote the exact and bootstrap "-level critical values of the symmetrical t
test. Let kni* denote the bootstrap version of kni (i = 2 or 4). This is obtained from kni by replacing
$ with bn and expected values with sample averages. By Theorem (4.1),
*P(*t* > t *) - "* # sup *P(*t* > J) - P *(*t** > J)*
" n
J
-1
# sup *[q(J,< ) - q(J,< *)]N(J)* + o[(nh )]
n n n
J
= O(k * - k ) + O(k * - k ).
n2 n2 n4 n4
The proof is completed by using methods similar to those used in proving Lemma 3 to show that
the difference between each of the terms in Table I and its bootstrap analog is o[(nhn)-1] almost
surely. Q.E.D.
Proof of Theorem 4.4: The proof consists of repeating each step of the proofs of Lemmas
1-11 and Theorems 4.1-4.3 with Hcn(b) in place of Hn(b) and Assumptions 1' and 3' in place of 1
and 3.
REFERENCES
Beran, R. (1988). Prepivoting test statistics: a bootstrap view of asymptotic refinements, Journal
of the American Statistical Association, 83, 687-697.
Bhattacharya, R.N. and J.K. Ghosh (1978). On the validity of the formal Edgeworth expansion,
Annals of Statistics, 6, 434-451.
Bloomfield, P. and W.L. Steiger (1983). Least Absolute Deviations: Theory, Applications, and
Algorithms, Boston: Birkhauser.
Buchinsky, M. (1995). Estimating the asymptotic covariance matrix for quantile regression
models: a Monte Carlo study, Journal of Econometrics,
68, 303-338.
Bassett, G. and R. Koenker (1978). Asymptotic theory of least absolute error regression, Journal
of the American Statistical Association, 73, 618-621.
De Angelis, D., P. Hall, and G.A. Young (1993). Analytical and bootstrap approximations to
estimator distributions in L1 regressions, Journal of the American Statistical Association,
88, 1310-1316.
Dielman, T. and R. Pfaffenberger (1984). Tests of linear hypotheses and L1 estimation: a Monte
Carlo comparison, American Statistical Association Business and Economic Statistics
Section Proceedings, 644-647.
Dielman, T. and R. Pfaffenberger (1988a). Bootstrapping in least absolute value regression: an
application to hypothesis testing, Communications in Statistics-Simulation and
Computation, 17, 843-856.
Dielman, T. and R. Pfaffenberger (1988b). Least absolute value regression: necessary samle sizes
to use norma theory inference procedures, Decision
Sciences, 19, 734-743.
Chandra, T.K. and J.K. Ghosh (1979). Valid asymptotic expansions for the likelihood ratio
statistic and other perturbed chi-square variables,
Sankhya, 41, Series A, 22-47.
Hahn, J. (1995). Bootstrapping quantile regression
estimators, Econometric Theory, 11, 105-121.
Hall, P. (1986). On the bootstrap and confidence
intervals, Annals of Statistics, 1431-1452.
Hall, P. (1992). The Bootstrap and Edgeworth Expansion.
New York: Springer-Verlag.
Hall, P. and J.L. Horowitz (1990). Bandwidth selection in semiparametric estimation of censored
linear regression models, Econometric Theory, 6,
123-150.
Horowitz, J.L. (1996). Bootstrap methods in econometrics: theory and numerical performance, in
Advances in Economics and Econometrics: 7th World Congress, D. Kreps and K.W.
Wallis, eds., Cambridge: Cambridge University Press,
forthcoming.
Janas (1993). A smoothed bootstrap estimator for a studentized sample quantile, Annals of the
Institute of Statistical Mathematics, 45, 317-329.
Koenker, R. (1982). Robust methods in econometrics,
Econometric Reviews, 1, 213-255.
Koenker, R. and G. Bassett (1978). Regression quantiles,
Econometrica, 46, 33-50.
Koenker, R. and G. Bassett (1982). Robust tests for heteroscedasticity based on regression
quantiles, Econometrica, 50, 43-61.
Mu..ller, H.-G. (1984). Smooth optimum kernel estimators of densities, regression curves and
modes, Annals of Statistics, 12, 766-774.
Nolan, D. and Pollard, D. (1987). U-processes: rates of convergence, Annals of Statistics, 15,
780-799.
Pakes, A. and Pollard, D. (1989). Simulation and the asymptotics of optimization estimators,
Econometrica, 57, 1027-1057.
Pollard, D. (1984). Convergence of Stochastic Processes,
New York: Springer-Verlag.
Powell, J.L. (1984) Least absolute deviations estimation for the censored regression model,
Journal of Econometrics, 25, 303-325.
Powell, J.L. (1986). Censored regression quantiles,
Journal of Econometrics, 32, 143-155.
TABLE I: TERMS OF APPROXIMATE CUMULANTS
Notation: gj(x), j an integer, is a product of components of x that may be different in different
occurrences. m1j(x,u) = n-1/2gj(x){[2K(u/hn) - 1] - 2(u/hn)K(1)(u/hn)}. For i = 2 or 3, mij(x,u) = Mi-
1m1j(x,u)/Mui-1. In addition, m4j(x,u) = n-1/2gj(x)K(1)(u/hn), m5j(x,u) = Mm4(x,u)/Mu, :ij = Emij(X,U),
and <ij = [mij(X,U) - :ij].
Cumulant Terms
kn2 - 1 nE(<11<12)E(<13<34), nE(<11<12)E(<13<54),
nE(<11<12)E(<23<24), nE(<11<12)E(<23<44),
nE(<11<12)E(<43<44)
kn4 n2E(<11<12)E(<13<14)E(<15<36), n2E(<11<12)E(<13<14)E(<25<26),
n2E(<11<12)E(<13<14)E(<25<46), n2E(<11<12)E(<13<14)E(<45<46),
n2E(<11<12)E(<13<14)E(<15<56)
FOOTNOTES
. When the t statistic can be approximated by a smooth function of sample
moments, the difference between the true and nominal levels of a
symmetrical t test with bootstrap critical values is typically O(n-2).
With critical values based on first-order asymptotic theory, the
difference is typically O(n-1). See, e.g., Hall (1992). The larger
approximation errors in the case of a t statistic for a median are due to
the median estimator's non-smooth objective function.
. De Angelis, et al. (1993) implement the bootstrap by sampling smoothed
LAD residuals. In contrast to sampling (Y,X) pairs, this method does not
easily generalize to heteroskedastic or censored models.
. K does not satisfy assumption 5b because it has only two derivatives at v
= "1. This problem can be overcome by smoothing K in neighborhoods of v
= "1, but doing so has no effect on the results of the experiments.
. Hall and Horowitz (1990) derived the bandwidth that minimizes the
asymptotic mean-square error of the variance estimator in a homoskedastic
quantile regression. They suggested a plug-in estimator for this
bandwidth. However, the bandwidth that optimizes the variance estimate
is not necessarily optimal for computing test statistics, and little is
known about the numerical performance of the Hall-Horowitz estimator in
testing.