EconWPA note:
Equations are really bad in here and not great in the .ps or .pdf due to WordPerfect original document in Prestige font about which Windows knows not.

BOOTSTRAP METHODS FOR MEDIAN REGRESSION MODELS

by

Joel L. Horowitz

Department of Economics

University of Iowa

Iowa City, IA 52242

August 1996




ABSTRACT

The least-absolute-deviations (LAD) estimator for a median-regression model does not

satisfy the standard conditions for obtaining asymptotic refinements through use of the bootstrap

because the LAD objective function is not smooth. This paper overcomes this problem by

smoothing the objective function so that it becomes differentiable. The smoothed estimator is

asymptotically equivalent to the standard LAD estimator. With bootstrap critical values, the

levels of symmetrical t and P2 tests based on the smoothed estimator are correct through O(n-(),

where ( < 1 but can be arbitrarily close to 1. In contrast, first-order asymptotic approximations

make an error of size O(n-(). The bootstrap accounts for terms of size O(n-() in the asymptotic

expansions of the test statistics, whereas first-order approximations ignore these terms. These

results also hold for symmetrical t and P2 tests for censored median regression models.


KEY WORDS: Asymptotic expansion, smoothing, L1 regression, least absolute deviations










Research supported in part by NSF grant SBR-9307677. I thank Peter Bickel, Moshe Buchinsky,

Oliver Linton, Paul Ruud, and Gene Savin for helpful comments and discussions.

BOOTSTRAP METHODS FOR MEDIAN REGRESSION MODELS

1. INTRODUCTION

A linear median regression model has the form

(1.1) Y = X$ + U,

where Y is an observed scalar dependent variable, X is a 1Hq vector of observed explanatory

variables, $ is a qH1 vector of constant parameters, and U is an unobserved random variable that

satisfies median(U*X=x) = 0 almost surely. The parameters $ may be estimated by the method of

least absolute deviations (LAD). Bassett and Koenker (1978) and Koenker and Bassett (1982)

give conditions under which the LAD estimator is n1/2-consistent and asymptotically normal.

Koenker and Bassett (1978) treat quantile regressions, which generalize (1.1) by specifying that a

quantile of the conditional distribution of U (not necessarily the median) is zero. Bloomfield and

Steiger (1983), Koenker (1982), and Koenker and Bassett (1978), among others, discuss the

robustness properties of the LAD estimator.

The asymptotic normality of the LAD estimator makes it possible to form asymptotic t and

P2 statistics for testing hypotheses about $ in (1.1). However, first-order asymptotic

approximations can be inaccurate with samples of the sizes encountered in applications. As a

result, the true and nominal levels of t and P2 tests and the true and nominal coverage probabilities

of confidence intervals for components of $ can be very different when critical values based on

first-order asymptotic approximations are used. Buchinsky (1995), de Angelis, et al. (1993),

Dielman and Pfaffenberger (1984, 1986, 1988), and Monte Carlo results that are presented later in

this paper provide numerical evidence on the accuracy of first-order approximations.

This paper shows that the bootstrap provides asymptotic refinements to the levels of t and

P2 tests of hypotheses about $ in (1.1). That is, as the sample size, n, increases, the differences

between the true and nominal levels of the tests converge to zero more rapidly with critical values

obtained from the bootstrap than with critical values obtained from first-order asymptotic theory.

It is well known that under suitable conditions the bootstrap provides asymptotic refinements to

the levels of tests and coverage probabilities of confidence intervals (see, e.g., Beran 1988; Hall

1986, 1992; Horowitz 1996). However, the standard theory of the bootstrap does not apply to t

and P2 statistics based on the LAD estimator. This theory is based on an Edgeworth expansion of

the distribution of the statistic of interest. The validity of the expansion is usually established by

using a Taylor series to approximate the statistic by a smooth function of sample moments that

satisfies conditions given, for example, by Bhattacharya and Ghosh (1978) for the existence of an

Edgeworth expansion. The LAD objective function is not smooth, however, and Taylor series

methods cannot be used to approximate the LAD estimator by a smooth function of sample

moments. Indeed, de Angelis, et al. (1993) have shown that the distribution of the LAD estimator

has a non-standard and very complicated asymptotic expansion.

This paper solves these problems by smoothing the LAD objective function to make it

differentiable. The resulting estimator will be called the smoothed LAD (SLAD) estimator. It is

first-order asymptotically equivalent to the standard LAD estimator but has much simpler higher-

order asymptotics. Use of the SLAD estimator greatly eases the task of obtaining asymptotic

refinements to levels of tests and, thereby, makes it possible to obtain results that go well beyond

those obtained in previous research.

Previous research by de Angelis, et al. (1993) has shown that when U is independent of X

and certain other conditions are satisfied, the error in the bootstrap approximation to the

cumulative distribution function (CDF) of the LAD estimator is o(n-2/5). Hahn (1995) showed

consistency of a bootstrap approximation to the CDF without assuming independence of U and X,

but he did not investigate the size of the approximation error. Neither de Angelis, et al. nor Hahn

investigated the bootstrap's ability to correct the levels of t and P2 tests based on the LAD

estimator.

Janas (1993) investigated the related but simpler problem of testing a hypothesis about a

population median (no covariates). He showed that when a suitable version of the bootstrap is

used to obtain the critical value, the difference between the true and nominal levels of a

symmetrical t test of a hypothesis about a population median is o(n-(), where ( < 1 but can be

arbitrarily close to 1 if the underlying population density is sufficiently smooth. By contrast, first-

order approximations make an error of size O(n-(). The bootstrap accounts for a term of size O(n-()

in the asymptotic expansion of the distribution of the test statistic, whereas first-order

approximations ignore this term.

This paper extends the results of previous research in three ways. First, it gives conditions

under which the bootstrap provides asymptotic refinements to the levels of t and P2 tests of

hypotheses about $ in (1.1). Second, in contrast to de Angelis, et al. (1993), it is not assumed that

U and X are independent. Any form of dependence is permitted as long as median(U*X=x) = 0

almost surely and mild regularity conditions are satisfied. Third, it is shown that the bootstrap

also provides asymptotic refinements for t and P2 tests of hypotheses about $ in the censored

median regression model of Powell (1984). Under the conditions that are given here, the

differences between the true and nominal levels of symmetrical t and P2 tests with bootstrap

critical values are o(n-() for a suitable ( satisfying 7/9 < ( < 1. By contrast, the differences between

the true and nominal levels are O(n-() with critical values based on first-order approximations. As

in Janas (1993), the bootstrap accounts for a term of size O(n-() in the asymptotic expansion of the

t or P2 statistic, whereas first-order approximations ignore this term. The value of ( depends on the

smoothness of the conditional density of U at zero and can be arbitrarily close to 1 if the density is

sufficiently smooth.

Although this paper treats explicitly only the levels of symmetrical t and P2 tests, it will be

clear that the results also apply to coverage probabilities of symmetrical confidence intervals and,

with suitable modifications, to equal-tailed and one-sided t tests and confidence intervals. In

addition, the methods used here can easily be extended to show that the bootstrap provides

asymptotic refinements for tests and confidence intervals based on smoothed versions of the

quantile-regression estimator of Koenker and Bassett (1978) and the censored quantile-regression

estimator of Powell (1986).

The remainder of the paper is organized as follows. Section 2 describes the smoothed LAD

estimator and gives its first-order asymptotic distribution. Section 3 describes the test statistics

and procedures that are used to obtain bootstrap critical values. Section 4 presents theorems

giving conditions under which the bootstrap provides asymptotic refinements to the levels of

symmetrical t and P2 tests. Section 4 also describes the extension to censored median regressions.

Section 5 presents the results of a small Monte Carlo investigation of the numerical performance

of the bootstrap, and Section 6 gives concluding comments. The proofs of theorems are in the

Appendix.

2. THE SMOOTHED LAD ESTIMATOR

This section describes the smoothed LAD estimator and establishes its asymptotic

equivalence to the standard LAD estimator.

Let {Yi,Xi: i = 1,...,n} be a random sample of (Y,X) in (1.1). The standard LAD estimator

solves

n

-1

minimize: (b) / n 3 *Y - X b*

n i i

b 0 B i=1

n

-1

(2.1) = n 3 (Y - X b)[2I(Y - X b > 0) -1],

i i i i

i=1

where B is the parameter set and I(C) is the indicator function. n(b) has cusps and, therefore, is not

differentiable at points b such that Yi - Xib = 0 for some i. The SLAD estimator smooths these

cusps by replacing the indicator function in n with a smooth function.

To do this, let K be a bounded, differentiable function satisfying K(v) = 0 if v # -1 and K(v)

= 1 if v $ 1. Additional requirements that K must satisfy are given in Section 4a. Let {hn} be a

sequence of positive real numbers (bandwidths) that converges to zero as n 6 4. The SLAD

estimator solves

n Y - X b

-1 ~ S i i ` Y

(2.2) minimize: H (b) / n 3 (Y - X b)s2KO O - 1­.

n i i T < h '

i=1 n

K is analogous to the integral of a kernel function for nonparametric estimation. K is not a kernel

function itself.

It may appear that the presence of a smoothing parameter hn in (2.2) is a disadvantage of

SLAD relative to LAD, but this appearance is misleading. With median regression models,

smoothing and the introduction of smoothing parameters are unavoidable for obtaining

satisfactory performance of the bootstrap. Under assumptions stronger than those made here, de

Angelis, et al. (1993) found that the error in the bootstrap approximation to the distribution of the

LAD estimator converges to zero more slowly than the error made by first-order asymptotic

theory unless the bootstrap samples a smoothed version of the data. Janas (1993) smooths the data

to obtain bootstrap refinements for a test of a hypothesis about a population median. The

smoothing methods of de Angelis, et al. and Janas do not extend easily to models with

heteroskedasticity or censoring. In this paper, smoothing the objective function replaces

smoothing the data. The resulting SLAD estimator is useful because it enables asymptotic

refinements to coverage probabilities of confidence intervals and levels of tests to be obtained

easily. The SLAD estimator is not needed if the only objective is to obtain a point estimate of $.

Let n be a LAD estimator (a solution to (2.1)) and bn be a SLAD estimator (a solution to

(2.2)). Intuition suggests that n and bn are asymptotically equivalent if hn converges to zero

sufficiently rapidly. Theorem 2.1 below shows that this intuition is correct. Regularity conditions

for the theorem are given in Section 4a. They are stated in the form that is used to obtain this

paper's main objective, which is to show that the bootstrap provides asymptotic refinements for

tests based on the SLAD estimator. The regularity conditions are stronger than would be needed

if the only objective were to prove that n and bn are asymptotically equivalent.

Theorem 2.1: Under Assumptions 1-6 of Section 4a, n1/2(bn - n) = op(1).#

To state the asymptotic distribution of n1/2(bn - $), let f(C*x) denote the density of U in (1.1)

conditional on X = x. Assume that f(0*x) exists at U = 0 for almost all x. Define D =

2E[X'Xf(0*X)], and assume that D is nonsingular. It follows from Theorem (2.1) and asymptotic

normality of the LAD estimator (see, e.g., Buchinsky 1995) that n1/2(bn - $) 6 d N(0,V), where V =

D-1E(X'X)D-1. To obtain a consistent estimator of V, let K(1)(v) = dK(v)/dv. Define

n Y - X b

-1 (1)S i i `

(2.3) D (b) = 2(nh ) 3 X 'X K O O.

n n i i < h '

i=1 n

It is not difficult to show that Dn(bn) 6 p D under the conditions given in Section 4a. E(X'X) can be

estimated consistently by the sample average of X'X. However, for purposes of obtaining

asymptotic refinements, it is more convenient to use an estimator of the exact finite-sample

variance of the first derivative of Hn(b) at b = $. This estimator is Tn(bn), where

n Y - X b Y - X b Y - X b 2

-1 :~ S i i ` Y S i i ` (1)S i i `B

T (b) = n 3 X 'X ;s2KO O - 1­ + 2O OK O OC .

n i i<T < h ' < h ' < h 'D

i=1 n n n

Under the conditions given in Section 4a, Tn(bn) 6 p E(X'X). It follows that V is estimated

consistently by Vn / Dn(bn)-1Tn(bn)Dn(bn)-1.

3. TESTING A HYPOTHESIS ABOUT $

a. The Symmetrical t and Chi-Square Tests

Let bni and $i, respectively, be the i'th components of bn and $ (i = 1,...,q). Let Vni be the (i,i)

component of Vn. The t statistic for testing the hypothesis H0: $i = $0i is t / n1/2(bni - $0i)/Vni1/2. If H0

is true, then t 6 d N(0,1). The symmetrical t test rejects H0 at the asymptotic " level if *t* > z"/2,

where z"/2, the asymptotic critical value, is the 1 - "/2 quantile of the standard normal distribution.

Now let R be an RHq matrix with R # q, and let c be an RH1 vector of constants. Consider

a test of the hypothesis H0: R$ = c. Assume that the matrix RD-1E(X'X)D-1R' is nonsingular. Then

under H0, the statistic

2 -1

P / n(Rb - c)'(RV R') (Rb - c)

n n n

is asymptotically chi-square distributed with R degrees of freedom. H0 is rejected at the

asymptotic " level if P2 exceeds the asymptotic critical value consisting of the 1 - " quantile of the

chi-square distribution.

Section 4 gives conditions under which the bootstrap provides asymptotic refinements to

critical values and levels of the symmetrical t and P2 tests.

b. The Bootstrap Procedure

The bootstrap estimates the distribution of a test statistic by treating the estimation data as if

they were the population. Thus, the bootstrap distribution of a statistic is the distribution induced

by sampling the estimation data randomly with replacement. The "-level bootstrap critical value

of the symmetrical t test is the 1 - " quantile of the bootstrap distribution of *t*. The "-level

bootstrap critical value of a test based on P2 is the 1 - " quantile of the bootstrap distribution of P2.

The bootstrap distributions of *t* and P2 can be estimated with arbitrary accuracy by Monte

Carlo simulation. To specify the Monte Carlo procedure, let the bootstrap sample be denoted by

{Yi*,Xi*: i = 1,...,n}. Define the following bootstrap analogs of Hn(b), Dn(b) and Tn(b):

n Y * - X *b

-1 ~ S i i ` Y

H *(b) / n 3 (Y * - X *b)s2KO O - 1­,

n i i T < h '

i=1 n

n Y * - X *b

-1 (1)S i i `

D *(b) = (nh ) 3 X *'X *K O O,

n n i i < h '

i=1 n

and

T *(b) =

n

n Y * - X *b Y * - X *b Y * - X *b 2

-1 :~ S i i ` Y S i i ` (1)S i i `B

n 3 X *'X *;s2KO O - 1­ + 2O OK O OC .

i i <T < h ' < h ' < h 'D

i=1 n n n

Let bn* be a solution to (2.2) with Hn replaced by Hn*. Let Vni* be the (i,i) component of the

matrix Dn*(bn*)-1Tn*(bn*)Dn*(bn*)-1.

The Monte Carlo procedure for estimating the bootstrap critical value of the symmetrical t

test is as follows. The procedure for estimating the bootstrap critical value of P2 is similar.

1. Generate a bootstrap sample {Yi*,Xi*: i = 1,...,n} by sampling the estimation data

randomly with replacement.

2. Using the bootstrap sample, compute the bootstrap t statistic for testing the

hypothesis H0*: $i = bni, where bn solves (2.2). The bootstrap t statistic is t* / n1/2(bni* -

bni)/(Vni*)1/2, where bni* is the i'th component of bn*.

3. Estimate the bootstrap distribution of *t** by the empirical distribution that is

obtained by repeating steps 1 and 2 many times. The bootstrap critical value of the symmetrical t

test is estimated by the 1 - " quantile of this empirical distribution.

Because the bootstrap critical value can be estimated with arbitrary accuracy by repeating

steps 1 and 2 sufficiently many times, the results presented in Section 4 pertain to the true

bootstrap critical value, not its Monte Carlo estimator.

4. MAIN RESULTS

This section presents theorems giving conditions under which the bootstrap provides

asymptotic refinements to the levels of symmetrical t and P2 tests based on the SLAD estimator.

As in other applications (see, e.g., Beran 1988, Hall 1992), the proof that the bootstrap provides

asymptotic refinements is based on showing that the distributions of the test statistics and their

bootstrap analogs have asymptotic expansions that are identical to sufficiently high order. The

main technical problem that must be solved is establishing conditions under which these

expansions exist. This is done in Theorems 4.1 and 4.2. Once the existence of the expansions is

established, it is a relatively easy matter to show that the use of bootstrap critical values provides

asymptotic refinements to the levels of symmetrical t and P2 tests. This is done in Theorem 4.3.

a.Assumptions

This subsection presents the assumptions under which it is proved that the bootstrap

provides asymptotic refinements for symmetrical t and P2 tests based on the SLAD estimator. Let

r $ 4 be an even integer. Let K(i)(v) = diK(v)/dvi. The assumptions are:

1. {Yi,Xi: i = 1,...,n} is a random sample of (Y,X), where Y = X$ + U, X is a 1Hq vector of

observed random variables, U is an unobserved random scalar, and $ is a qH1 constant vector.

2. $ is an interior point of B, which is a compact subset of Uq.

3. The support of the distribution of X is bounded, and E(X'X) is positive definite.

4. Let F(C*x) and f(C*x), respectively, denote the CDF and density of U conditional on X

= x. (a) F(0*x) = 0.50 for almost every x. (b) For all u in a neighborhood of 0 and almost every

x, f(u*x) exists, is bounded away from zero, and is r - 1 times continuously differentiable with

respect to u.

5. (a) K(C) is bounded, K(v) = 0 if v # -1, and K(v) = 1 if v $ 1. (b) K is 4-times

differentiable everywhere, K(1)(v) is symmetrical about v = 0, and K(i) (i = 1,...,4) is bounded and

Lipschitz continuous on (-4,4). (c) Let (v) be a vector whose components are [2K(v) - 1] and its

derivatives through order 3, vK(1)(v) and its derivatives through order 3, and [2K(v) - 1 +

2vK(1)(v)]2 and its first derivative. For any 2 0 U10 satisfying 222 = 1, there is a partition of [-1,1],

-1 = a1 < a2 <... < aL(2) = 1 such that 2'(v) is either strictly increasing or strictly decreasing on (aR-1,

aR) (R = 2,...,L(2)). (d) For each integer i (1 # i # r),

1 0 if i < r

! i (1)

# v K (v)dv =

"

-1 C (nonzero) if i = r

K

6. hn % n-6, where 2/(2r + 1) < 6 < 1/3.

Assumptions 1-5b define the model and insure that $ is identified, n1/2(bn - $) is

asymptotically normal, and the Taylor series expansions used to obtain higher-order asymptotic

approximations to t and P2 exist. The assumption that X has bounded support is not essential and

can be dropped at the expense of more complex proofs. Assumption 5c is used to establish a

modified form of the Cramer condition of Edgeworth analysis (lemma 9 of the Appendix).

Assumption 5d, which requires K(1) to be a "higher-order" kernel, and Assumption 6 insure that

the (first-order) asymptotic distribution of n1/2(bn - $) has mean zero and that Taylor series

remainder terms are negligibly small. Functions K satisfying assumption 5 can be constructed by

integrating kernels given by Mu..ller (1984).

b. Theorems

This section gives theorems that establish conditions under which the bootstrap provides

asymptotic refinements for symmetrical t and P2 tests based on the SLAD estimator. Theorems

4.1 and 4.2 give conditions under which the sample and bootstrap versions of *t* and P2 have

Edgeworth-type asymptotic expansions. Theorem 4.3 shows that the bootstrap provides

asymptotic refinements under the same conditions.

The following additional notation is used. Let M and N, respectively, denote the standard

normal distribution and density functions. Let Pn* denote the bootstrap probability measure. This

measure places mass 1/n at each data point (Yi,Xi). The cumulants of t through order 4 can be

approximated with an accuracy of O[(nhn)-1] by using Taylor-series expansions that are described

in the Appendix. Denote the approximate cumulants by the vector <n. The first four cumulants of

t* conditional on the estimation sample can also be approximated with an accuracy of O[(nhn)-1]

almost surely. Let <n* be the vector containing the approximate bootstrap cumulants. Define d =

dim(<n) = dim(<n*).

The following theorem establishes the existence of Edgeworth-type expansions of the

distributions of *t* and *t**.

Theorem 4.1: Let assumptions 1-6 hold. Let < be an arbitrary vector with dimension d.

There is a function q(J,<) such that: (a) q(C,<) is a polynomial; (b) q(J,<n) and q(J,<n*) consist of

terms whose sizes are O[(nhn)-1] (almost surely in the case of q(J,<n*));

(c)

-1

(4.1) P(*t* # J) = 2M(J) - 1 + q(J,< )N(J) + o[(nh ) ]

n n

uniformly over J, and

(d)

-1

P *(*t** # J) = 2M(J) - 1 + q(J,< *)N(J) + o[(nh ) ]

n n n

uniformly over J almost surely. #

The coefficents of J in q are functions of the approximate cumulants of t and t*. These, in

turn, are functions of asymptotic forms of moments of products of derivatives of Hn($), Dn($), and

Tn($) with respect to the components of $. Because the number of such moments is very large,

obtaining an analytic expression for q is not feasible. It is possible, however, to calculate the rates

at which the moments converge to zero, and this is sufficient to prove the theorem.

The proof of Theorem 4.1 takes place in two main steps. The first step consists of showing

that t and t* can be approximated up to asymptotically negligible remainder terms by functionals

of derivatives of Hn($), Dn($), and Tn($) (or their bootstrap analogs in the case of t*). This is done

in Propositions 1 and 2 of the Appendix. The second step is to show that the distributions of the

approximations to t and t* have asymptotic expansions through order (nhn)-1. This step is carried

out using methods similar to those used to prove Theorems 5.5 and 5.6 of Hall (1992).

Now consider the P2 test. Let P2* be the bootstrap version of the P2 statistic. The first two

moments of P2 and P2* can be approximated through O[(nhn)-1]. Let <nP and <nP* denote the

vectors of approximate moments. Let FP,R denote the chi-square distribution function with R

degrees of freedom. The following theorem, which is a modified version of Theorem 1b of

Chandra and Ghosh (1979), gives conditions under which the distributions of P2 and P2* have

Edgeworth expansions through O[(nhn)-1].

Theorem 4.2: Let assumptions 1-6 hold. Let < be an arbitrary 2H1 vector. There is a

function qP(J,<) such that q(J,<nP) and q(J,<nP*) consist of terms whose sizes are O[(nhn)-1] (almost

surely in the case of q(J,<nP*)),

z

2 ! -1

(4.2) P(P < z) = # d{[1 + q (>,< )]F (>)} + o[(nh ) ]

" P nP P,R n

-4

uniformly over z, and

z

2 ! -1

P *(P * < z) = # d{[1 + q (>,< *)]F (>)} + o[(nh ) ]

n " P nP P,R n

-4

uniformly over z almost surely. #

The final theorem shows that the use of bootstrap critical values yields asymptotic

refinements to the levels of symmetrical t and P2 tests. Let t"* denote the "-level critical value of

the bootstrap symmetrical t test. That is, t"* is the 1 - " quantile of the bootstrap distribution of

*t**. Let c"* denote "-level critical value of the bootstrap P2 test. That is, c"* is the 1 - " quantile

of the bootstrap distribution of P2*.

Theorem 4.3: Let assumptions 1-6 hold. Under H0: $i = $0i,

a.P(*t* > t"*) = " + o[(nhn)-1].

If RD-1E(X'X)D-1R' is nonsingular, then under H0: R$ = c,

b.P(P2 > c"*) = " + o[(nhn)-1]. #

First-order asymptotic approximations drop the terms qN and qPFP,r in (4.1) and (4.2). The

resulting approximation errors are O[(nhn)-1].

c. Censored Median Regressions

This section describes the extension of the foregoing results to the censored median

regression model of Powell (1984). The model is

(4.3) Y = max(0, X$ + U),

where X, $, and U are as defined in (1.1). The censored LAD (CLAD) estimator of $, cn, solves

n

-1

minimize: n 3 *Y - max(0,X b)*,

i i

b 0 B i=1

where B is the parameter set. Equivalently, cn solves

minimize: (b) /

cn

b 0 B

n

-1

n 3 {(Y - X b)[2I(Y - X b > 0) -1] - Y }I(X b > 0),

i i i i i i

i=1

Under regularity conditions, n1/2(cn - $) 6 d N(0,Vc), where Vc = Dc-1TcDc-1, Dc = 2E[X'Xf(0*X)I(X$

> 0)], and Tc = E[X'XI(X$ > 0)] (Powell 1984).

Like the objective function of the LAD estimator, cn has cusps. The smoothed CLAD

estimator (SCLAD) removes them by replacing the indicator functions in cn with smooth

functions. The SCLAD estimator, bcn, solves

n

-1

minimize: H (b) / n 3 g (Y ,X ,h ,b),

cn c i i n

b 0 B i=1

where

: ~ Sy - xb` Y B Sxb `

g (y,x,h,b) = ;(y - xb)s2KO O - 1­ - yCKO - 2O,

c < T < h ' D < h '

and K and hn are as in (2.2). The smoothed version of I(xb > 0) is K(xb/h -2) instead of K(xb/h)

for technical reasons relating to prevention of asymptotic bias. Under conditions stated below,

n1/2(bcn - cn) = op(1) as n 6 4.

To form t and P2 statistics based on bcn it is necessary to have consistent estimators of Dc

and Tc. Define

n Y - X b

-1 (1)S i i `

D (b) = (nh ) 3 X 'X K O OI(Y > 0).

cn n i i < h ' i

i=1 n

It is not difficult to show that Dcn(bcn) 6 p Dc. Tc can be estimated consistently by the sample

average of X'XI(Xbcn > 0) (Powell 1984). As in SLAD estimation, however, for purposes of

obtaining asymptotic refinements it is more convenient to use an estimator of the exact finite-

sample variance of the first derivative of Hcn(b) at b = $. This estimator is Tcn(bcn), where

n

-1

T (b) = n 3 [Mg (Y ,X ,h ,b)/Mb][Mg (Y ,X ,h ,b)/Mb]'

cn c i i n c i i n

i=1

Vc is estimated consistently by Vcn / Dcn(bcn)-1Tcn(bn)Dcn(bcn)-1.

The formulae for t and P2 statistics for testing hypotheses about $ in (4.3) are the same as in

Section 3a but with Vn replaced by Vcn. The procedure for obtaining bootstrap critical values for

these statistics is the same as in Section 3b but with Dn, Tn, Dn*, and Tn* replaced with Dcn, Tcn,

and their bootstrap analogs.

To establish the ability of the bootstrap to provide asymptotic refinements for t and P2 tests

based on the SCLAD estimator, it is necessary to modify Assumptions 1 and 3 as follows:

1'. {Yi,Xi: i = 1,...,n} is a random sample of (Y,X), where Y = max(0, X$ + U), X is a 1Hq

vector of observed random variables, U is an unobserved random scalar, and $ is a qH1 constant

vector.

3'. The support of the distribution of X is bounded, P(X$ = 0) = 0, and E[(X'X)I(Xb > ,)] is

positive definite for some , > 0 and all b in a neighborhood of $.

The following theorem shows that the SCLAD and CLAD estimators are asymptotically

equivalent and that the bootstrap provides asymptotic refinements to the levels of symmetrical t

and P2 tests based on the SCLAD estimator.

Theorem 4.4: Let assumptions 1', 2, 3', and 4-6 hold. Then

a. n1/2(bcn - cn) = op(1) as n 6 4.

Let t"* and c"*, respectively, denote the "-level bootstrap critical values of the SCLAD

symmetrical t and P2 tests. Under H0: $i = $0i,

b.P(*t* > t"*) = " + o[(nhn)-1].

If RDc-1TcDc-1R' is nonsingular, then under H0: R$ = c,

c.P(P2 > c"*) = " + o[(nhn)-1]. #

5. MONTE CARLO EXPERIMENTS

This section describes the results of a small Monte Carlo investigation of the finite-sample

level of the SLAD t test with bootstrap critical values. The numbers of experiments and

replications per experiment are small because of the very long computing times they entail, even

on a fast computer.

Each experiment evaluates the level of a symmetrical t test using asymptotic or bootstrap

critical values. The hypothesis being tested is H0: $1 = 1 in the model Y = $0 + $1X + U, where $0

and $1 are scalar parameters whose true values are ($0,$1) = (1,1) (so H0 is true), and X ~ U[1,5].

There are 3 different distributions of U. In the first experiment, U ~ N(0,2). In the second, U ~

Student t with 3 degrees of freedom scaled to have a variance of 2. In the third experiment, U =

0.25(1 + X)V, where V ~ N(0,1). Thus, U is heteroskedastic. The smoothing function K is

:0 if v < -1

=

=

= 3 5 7

K(v) = ;0.5 + (105/64)[v - (5/3)v + (7/5)v - (3/7)v ] if *v* # 1

=

=

=

<1 if v > 1

K is the integral of a 4th-order kernel for nonparametric density estimation (Mu..ller 1984).

The experiments with the SLAD estimator consisted of computing the empirical level of the

nominal 0.05-level symmetrical t test of H0 with bootstrap critical values. To provide a basis for

evaluating the performance of the bootstrap, experiments were also carried out with the

unsmoothed LAD estimator. These consisted of computing the empirical level of the nominal

0.05-level symmetrical t test of H0 with the asymptotic critical value. The LAD estimator was

studentized by using the consistent variance estimator Dn(n)-1En[(1,X)'(1,X)]Dn(n)-1, where n is the

LAD estimator of ($0,$1), Dn is as in (2.3), and En(C) is the sample average. Dn for the LAD

estimator was computed using the 2nd-order kernel K2(v) = (15/16)(1 - v2)2I(*v* # 1).

Computation of the SLAD and LAD t statistics require choosing the value of a bandwidth

parameter for each. Existing theory provides little guidance on how this should be done in finite

samples, so experiments were carried out using a range of bandwidth values.

The experiments used a sample size of n = 50 and were carried out with a program written

in GAUSS with GAUSS pseudo-random number generators. There were 500 Monte Carlo

replications per experiment with the SLAD estimator and 1000 with the LAD estimator. There

were fewer replications in the SLAD experiments because of the long computing times required

for Monte Carlo simulations with bootstrapping. Each experiment consisted of repeating the

following steps 500 or 1000 times:

A. Generate an estimation data set of size n = 50 by randomly sampling (Y,X) from the model

under consideration. Obtain the SLAD or LAD estimate of ($0,$1), and compute the t statistic for

testing H0: $1 = 1. Call its value tS if it is based on the SLAD estimator and tL if it is based on the

LAD estimator.

B. In experiments with tS, compute the bootstrap critical value by following steps 1-3 in

Section 3b. Bootstrap samples were obtained by sampling the estimation data generated in step A

randomly with replacement. Denote the 0.05-level bootstrap critical value of the SLAD

symmetrical t test by t0.05*. t0.05* was computed from 100 bootstrap samples.

C. Reject H0 at the nominal 0.05 level based on tS if *tS* > t0.05*. Reject H0 at the

nominal 0.05 level based on tL if *tL* > 1.96, the asymptotic critical value.

The results of the experiments are summarized in Figures 1-3, which show the empirical

levels of the SLAD t test with bootstrap critical values and the LAD t test with the asymptotic

critical value as functions of the bandwidth. In the experiments, the empirical and nominal levels

of the LAD test can be made equal by choosing the bandwidth appropriately. The empirical level

is very sensitive to the bandwidth, however, and it is an open question whether the "optimal"

bandwidth can be estimated precisely in applications. In contrast, the empirical level of the SLAD

test with bootstrap critical values is close to the nominal level over a wide range of bandwidths.

Thus, use of the SLAD test with bootstrap critical values greatly decreases the importance of

precisely estimating an "optimal" bandwidth. Obtaining precise bandwidth estimates is difficult

even in relatively simple settings such as nonparametric density estimation, so the SLAD test's

relative insensitivity to the bandwidth is an important practical advantage of this test.

6. CONCLUSIONS

This paper has shown how the bootstrap can be used to obtain asymptotic refinements for

tests of hypotheses about the parameters of uncensored and censored linear median regression

models with or without heteroskedasticity of unknown form. The method is based on smoothing

the objective function of the relevant estimator. This approach contrasts with previous research on

bootstrap methods for median regressions, which has achieved less general results under more

restrictive assumptions by smoothing the data instead of the estimator. This paper has not

addressed the problem of how to choose the bandwidth parameter required for smoothing. It is

likely that this can also be done with the bootstrap, but the technical details are sufficiently

complex and lengthy to require treatment in a separate paper.

APPENDIX

This Appendix provides proofs of the theorems stated in the text. It is assumed unless

otherwise stated that assumptions 1-6 hold. Define Ui = Yi - Xi$ and

n Y - X b

-1 : ~ S i i ` Y B

G (b) / n 3 ;(Y - X b)s2KO O - 1­ - *U *C

n < i i T < h ' i D

i=1 n

The SLAD estimator minimizes both Hn(b) and Gn(b) over b 0 B. Gn is used for the proofs

because it is a sum of bounded terms.

Let 2C2 denote the Euclidean norm. Let X(j) denote the j'th component of X. For b 0 B,

define G(b) = E[*Y - Xb* - *U*] and

n

-1

(b) = n 3 (*Y - X b* - *U *).

n i i i

i=1

a. Step 1: Approximating t and t*

Lemma 1:

-1/2

sup *G (b) - G(b)* # o(n log n) + 2h

n n

b 0 B

almost surely.

Proof: It follows from Lemma 22 of Nolan and Pollard (1987) and Theorem 2.37 of Pollard

(1984) that *n(b) - G(b)* = o(n-1/2log n) almost surely uniformly over b 0 B. Also,

n Y - X b

-1 ~ S i i ` Y

G (b) - (b) = 2n 3 (Y - X b)sKO O - I(Y - X b > 0)­.

n n i i T < h ' i i

i=1 n

The summand differs from zero only if *Yi - Xib* # hn. Therefore,

n

-1

*G (b) - (b)* # 2n 3 *Y - X b*I(*Y - X b* # h ) # 2h .

n n i i i i n n

i=1

The lemma now follows from the triangle inequality. Q.E.D.

Lemma 2: Given any r > 0, 2bn - $2 # r almost surely for all sufficiently large n.

Proof: Let Nr = {b 0 B: 2b - $2 > r}. By assumptions 3 and 4, $ uniquely minimizes G(b)

over B. Therefore, G(b) > G($) + * for all b 0 Nr and some * > 0. By Lemma 1 and hn 6 0, there

is a finite n0 such that Gn(b) > Gn($) + */2 > Gn($) almost surely for all b 0 Nr if n > n0. But Gn(bn)

# Gn($). Therefore, bn Nr almost surely if n > n0. Q.E.D.

For i,j,k,R,m = 1,...,q, define Gni(b) = MGn(b)/Mbi, Gnij(b) = M2Gn(b)/MbiMbj, Gnijk(b) =

M3Gn(b)/MbiMbjMbk, and GnijkR(b) = M4Gn(b)/MbiMbjMbkMbR. Also, define Dni(b) =

MDn(b)/Mbi, Dnij(b) = M2Dn(b)/MbiMbj, and Tni(b) = MTn(b)/Mbi.

Lemma 3: For all i,j,k,R = 1,...,q, the following relations hold almost surely as n 6 4:

(a) sup b 0 B *Gni(b) - EGni(b)* = o[(log n)/n1/2]

(b) sup b 0 B *Gnij(b) - EGnij(b)* = o[(log n)/(nhn)1/2]

(c) sup b 0 B *Gnijk(b) - EGnijk(b)* = o[(log n)/(nhn3)1/2]

(d) sup b 0 B *GnijkR(b) - EGnijkR(b)* = o[(log n)/(nhn5)1/2]

(e) sup b 0 B *Dn(b) - EDn(b)* = o[(log n)/(nhn)1/2]

(f) sup b 0 B *Dni(b) - EDni(b)* = o[(log n)/(nhn3)1/2]

(g) sup b 0 B *Dnij(b) - EDnij(b)* = o[(log n)/(nhn5)1/2]

(h) sup b 0 B *Tn(b) - ETn(b)* = o[(log n)/n1/2]

(i) sup b 0 B *Tni(b) - ETni(b)* = o[(log n)/(nhn)1/2],

where (e)-(i) apply to the individual components of the matrices Dn, Dni, Dnij, Tn, and Tni. In

addition, for all i,j,k,R = 1,..., q

(j) EGni($) = 2[(1 - r)/r!]CKhnrE[X(i)f(r - 1)(0*X)] + o(hnr)

(k)

n

1/2 -1/2 (j) 1/2 r 1/2

n G ($) = -n 3 X [2I(U > 0) - 1] + O (n h + h )

nj i i p n n

i=1

(l) EGnij($) = 2E[X(i)X(j)f(0*X)] + O(hnr)

(m) EGnijk(b), EGnijkR(b), EDn(b), EDni(b), EDnij(b), ETn(b), and ETni(b), and are O(1) as n 6

4 for all b in a neighborhood of $.

Proof: Parts (a)-(i) are proved by using Lemmas 2.14 of Pakes and Pollard (1989) and 22 of

Nolan and Pollard (1987) to show that the summands of the relevant G, D and T functions form

Euclidean classes and then applying Theorem 2.37 of Pollard (1984). To prove (j), write Gnj($) =

Gnj(1) + Gnj(2), where

n

(1) -1 (j)

G = -n 3 X [2K(U /h ) -1],

nj i i n

i=1

n

(2) -1 (j) (1)

G = -2n 3 (U /h )X K (U /h ),

nj i n i i n

i=1

and Ui = Yi - Xi$. Then

4

(1) ! (j)

EG = -# x [2K(u/h ) -1]f(u*x)dudP(x).

nj " n

-4

Since 2K(u/hn) - 1 = "1 unless *u/hn* < 1, a change of variables gives

(1) ! (j)

(A1) EG = - # x [1 - F(h *x) - F(-h *x)]dP(x)

n1 " n n

1

! (j)

- h # x [2K(.) - 1]f(h .*x)d.dP(x).

n" n

-1

Integration by parts yields

1 1

! k ~ ! k + 1 (1) Y

# . [2K(.) - 1]d. = [2/(k + 1)]s1 - # . K (.)d.­*

" T " k

-1 -1

/ [2/(k + 1)](1 - c )* ,

k k

for each k = 0, ..., r - 1, where *k = 0 if k is even and 1 if k is odd, and ck = 0 unless k = r - 1.

Therefore, Taylor series expansions of the integrands in (A1) about hn = 0 yield

r-1

(1) k k (j) (k)

EG = h 3 [(h ) - (-h ) ]/(k + 1)!]E[X f (0*X)]

nj n n n

k=0

r-1

k (j) (k) r

- 2h 3 [(1 - c )* /(k + 1)!]h E[X f (0*X)] + o(h )

n k k n n

k=0

-1 r (j) (r - 1) r

(A2) = 2(r!) C h E[X f (0*X)] + o(h ).

K n n

In addition,

4

(2) ! (j) (1)

EG = -2# X (u/h )K (u/h )f(u*x)dudP(x)

nj " n n

-4

1

! (j) (1)

= -2h # x .K (.)f(h .*x)d.dP(x).

n" n

-1

A Taylor series expansion of the integrand about hn = 0 yields

r-1 1

(2) ! k - 1 (1) k (j) (k) r

EG = -2h 3 # . K (.)d.(h /k!)E[X f (0*X)] + o(h )

nj n " n n

k=0 -1

-1 r (j) (r - 1) r

(A3) = -2C [(r - 1)!] h E[X f (0*X)] + o(h ).

K n n

Part (j) follows by combining (A2) and (A3).

To prove (k), observe that

n

1/2 (1) -1/2 (j)

n G = -n 3 X [2I(U > 0) - 1]

nj i i

i=1

n

-1/2 (j)

- 2n 3 X [2K(U /h ) - I(U > 0)].

i i n i

i=1

The variance of the second term is O(hn), and methods similar to those used to prove (k) show that

its mean is O(n1/2hnr). Similarly, En1/2Gnj(2) = O(n1/2hnr), and Var(n1/2Gnj(2)) = O(hn). Part (k) now

follows from Chebyshev's inequality.

To prove (l), write Gnjk($) = Gnjk(1) + Gnjk(2), where

n

(1) -1 (j) (k) (1)

G = 4(nh ) 3 X X K (U /h )

njk n i i i n

i=1

and

n

(2) -1 (j) (k) (2)

G = 2(nh ) 3 X X (U /h )K (U /h )

njk n i i i n i n

i=1

Arguments similar to those applied to EGnj(1) yield

(1) (j) (k) r

(A4) EG = 4E[X X f(0*X)] + O(h ).

njk n

Similarly,

4

(2) -1! (j) (k) (2)

EG = 2h # x x (u/h )K (u/h )f(u*x)dudP(x)

nj n " n n

-4

r 1

! i + 1 (2) i (i) r

= 2 3 # . K (.)(h /i!)f (0*x)d.dP(x) + o(h )

" n n

i=0 -1

by a change of variables and a Taylor series expansion. Integration by parts shows that

-1 if i = 0

1

! i + 1 (2)

(A5) # . K (.)d. = 0 if 1 # i < r

"

-1

-(r + 1)C if i = r

K

Therefore

(2) (j) (k) r

(A6) EG = -2E[X X f(0*X)] + O(h ).

njk n

Part (l) follows by combining (A4) and (A6).

To prove (m), consider EGnjkR(b). Let )b = b - $. Write GnjkR(b) = GnjkR(1)(b) + GnjkR(2)(b),

where

n U - X )b

(1) 2 -1 (j) (k) (R) (2)~ i i Y

G (b) = -6(nh ) 3 X X X K s ­

njkR n i i i T h

i=1 n

and

n U - X )b U - X )b

(2) 2 -1 (j) (k) (R) i i (2)~ i i Y

G (b) = -2(nh ) 3 X X X K s ­.

njkR n i i i h T h

i=1 n n

Now

4

(1) -2 : (j) (k) (R)! (2)~u - X)bY B

EG (b) = -6h E;X X X # K s ­f(u*X)duC

njkR n < " T h D

-4 n

A change of variables, a Taylor series expansion, and (A5) yield

1

(1) : (j) (k) (R)! (2) (1) B

EG (b) = - 6E;X X X # .K (.)f (. + X)b*X)d.C,

njkR < " D

-1

for between 0 and hn, which is bounded uniformly over )b in a neighborhood of 0 by assumption

4. Similar arguments apply to EGnjkR(2)(b) and the remaining G, D, and T functions. Q.E.D.

Define SnG to be a vector containing the unique components of Gni($), Gnij($), Gnijk($), and

GnijkR($) (i,j,k,R = 1,...,q). Order the components of SnG so that the first q are the Gni($).

Lemma 4: Let SG = plim n 6 4 SnG. There is a function 7$(SnG) taking values in Uq such that

7$(SG) = 0 and

3/2

(b - $) = 7 (S ) + o[1/(n h )]

n $ nG n

almost surely as n 6 4.

Proof: Define *n = bn - $ and *ni = bni - $i (i = 1,...,q). Let GnC($) be the vector whose

components are the unique components of Gni($) (i = 1,...,q). For fixed j, k, and R, define GnCj($),

GnCjk($), and GnCjkR($), respectively, to be the q-dimensional vectors whose components are Gnij($),

Gnijk($), and GnijkR($) (i = 1,...,q). Let Qn be the matrix whose (i,j) element is Gnij($). By Lemma 2,

bn satisfies the first-order condition GnC(bn) = 0 almost surely for all sufficiently large n. By

assumptions 3-4 and Lemma 3, Qn($) has an inverse almost surely for all sufficiently large n.

Therefore, a Taylor series expansion of GnC(bn) = 0 about bn = $ yields

-1

(A7) (b - $) = -Q [G ($) + (1/2)G ($)* *

n n nC nCjk nj nk

+ (1/6)G ($)* * * + R ],

nCjkR nj nk nR n

almost surely for all sufficiently large n, where the summation convention is used,

R = (1/6)[G ( ) - G ($)]* * * ,

n nCjkR n nCjkR nj nk nR

and n is between bn and $. By using arguments similar to those used to prove Lemma 3(m), it may

be shown that E[GnCjkR(b) - GnCjkR($)] = O(b - $) for b in a neighborhood of $. This result and

Lemma 3(d) imply that

5 1/2 3

2R 2 # {o[(log n)/(nh ) ] + O(2b - $2}2b - $2

n n n n

almost surely. Given any < > 0 and c > 0, suppose that 2*n2 < cn-1/2 + <. Then it follows from

Lemma 3 that the right-hand side of (A7) is less than cn-1/2 + < almost surely for all sufficiently large

n. In addition, Lemma 3b and assumptions 3-4 imply that the consistent solution to GnC(b) = 0 is

almost surely unique for all sufficiently large n. Therefore, application of the Brouwer fixed point

theorem to the right-hand side of (A7) shows that for any c > 0, < > 0,

-1/2 + <

(A8) 2b - $2 # cn

n

almost surely for all sufficiently large n. Application of the implicit function theorem to (A7)

shows that there is almost surely a differentiable function 7$ such that 7$(SG) = 0 and

(A9) (b - $) = 7 (S + ),

n $ nG n

where n is a vector such that dim(n) = dim(SnG), Rn forms the first q components of n, and the

remaining components of n are 0. Application of the mean value theorem to (A9) combined with

(A8) shows that

4 5 -1/2 3<

(A10) (b - $) = 7 (S ) + O[(log n)(n h ) n

n $ nG n

almost surely for any < > 0. The lemma now follows from assumption 6 by making < sufficiently

small. Q.E.D.

Proof of Theorem 2.1: It follows from Lemma 3 that Qn 6 D almost surely. Therefore, by

(A7), (A8) and a further application of Lemma 3,

1/2 -1

n (b - $) = D G ($) + o (1)

n nC p

n

-1 -1/2 (j)

(A11) = D n 3 X [2I(U > 0) - 1] + o (1).

i i p

i=1

The theorem follows by observing that (A11) is the Bahadur representation of the LAD estimator.

Q.E.D.

Let Sn denote the vector consisting of the unique components of SnG, Dn($), Dni($), Dnij($),

Tn($), and Tni($).

Lemma 5: For each i = 1,...,q, there is a real-valued function 7Vi(Sn) such that

1/2

V = 7 (S ) + . ,

ni Vi n n

where .n = o[(nhn)-1] almost surely.

Proof: Expand Dn(bn) and Tn(bn) in Taylor series about bn = $ through orders 2bn - $22 and

2bn - $2, respectively, and use (A10) to obtain

1/2 -1

(A12) V = (S , S + T ) + o[(nh ) ]

ni Vi n nG n n

almost surely for a suitable differentiable function Vi, where Tn = o[(nhn)-1]. The lemma follows by

applying the mean value theorem to (A12). Q.E.D.

Proposition 1: Define 7(Sn) = 7$(SnG)/7Vi(Sn). Then

1/2

lim sup (nh ){P(t # z) - P[n 7(S ) # z]} = 0.

n n

n 6 4 z

Proof: By Lemmas 4 and 5

1/2

n 7 (S ) + ,

$ nG n

(A13) t = ,

7 (S ) + <

Vi n n

where ,n and <n are o[(nhn)-1] almost surely. Define )n = t - n1/27(Sn). A Taylor series

approximation applied to (A13) yields )n = o[(nhn)-1] almost surely. Choose the sequence {Tn}

such that Tn = o[(nhn)-1] and )n/Tn = o(1) almost surely. Then

1/2 1/2

P[n 7(S ) # z - T ] - P[n 7(S ) # z] - P(2) 2 > T )

n n n n n

1/2

# P(t # z] - P[n 7(S ) # z]

n

1/2 1/2

# P[n 7(S ) # z + T ] - P[n 7(S ) # z] + P(2) 2 > T )

n n n n n

for every z. Therefore, since )n = o[(nhn)-1] and )n/Tn = o(1) almost surely,

1/2 -1

(A14) P(t # z) - P[n 7(S ) # z] = o[(nh ) ].

n n

uniformly over z. The proposition follows by multiplying both sides of (A14) by nhn and taking

the limit as n 6 4. Q.E.D.

Let En denote the expectation with respect to Pn*. Define Gn*(b) by replacing (Yi,Xi) with

(Yi*,Xi*) in the definition of Gn(b).

Lemma 6: For any b 0 B, define Ub = Y - Xb and

n

-1 d d

W (b) = n 3 [(U */h ) g(X *)f(U */h ) - E (U /h ) g(X)f(U /h )],

n bi n i bi n n b n b n

i=1

where g is bounded for bounded values of its argument, d = 0 or 1, and f is a bounded, Lipschitz

continuous function of bounded variation with support [-1,1]. (a) Define >n = [(hn/n)log n]1/2.

There is a finite C0 > 0 such that for all C > C0 and any ( $ 0

(

lim (nh ) P *( sup *W (b)* > C> ) = 0

n n n n

n 6 4 b 0 B

almost surely (P).

(b) Define >n = [(log n)/n]1/2. There is a finite C0 > 0 such that for all C > C0 and any ( $

0

(

lim (nh ) P *( sup *G *(b) - E G *(b)* > C> ) = 0

n n ni n ni n

n 6 4 b 0 B

and

(

lim (nh ) P *( sup *T *(b) - E T *(b)* > C> ) = 0

n n n n n n

n 6 4 b 0 B

almost surely (P).

(c) For any ( $ 0 and 0 > 0,

(

lim (nh ) P *( sup *G *(b) - G (b)* > 0) = 0

n n n n

n 6 4 b 0 B

Proof: Only part (a) is proved. The proofs of parts (b) and (c) are similar. Partition B into

subsets {Bj: j = 1,...,J} such that 2b1 - b22 < >n2 whenever b1 and b2 are in the same subset. For

each j = 1,...,J, let bj be a point in Bj. Observe that J = O(>n-2q). Then

J

P *( sup *W (b)* > C> ) = P *( c sup *W (b)* > C> )

n n n n n n

b 0 B j=1 b 0 B

j

J

(A15) # 3 P *( sup *W (b)* > C> ).

n n n

j=1 b 0 B

j

Because g is bounded, X has bounded support, and f is bounded and Lipschitz continuous, there is

an M < 4 such that

sup *W (b)* # 2M(log n)/n + *W (b )*

n n j

b 0 B

j

Therefore, for all sufficiently large n

(A16) P *( sup *W (b)* > C> ) # P *(*W (b )* > C> /2)

n n n n n j n

b 0 B

j

By using Lemma 22 of Nolan and Pollard (1987) and Theorem 2.37 of Pollard (1984), it can be

shown that En[nWn(bj)2] # c1hn almost surely (P) for some c1 < 4 and all sufficiently large n.

Therefore, by Bernstein's inequality

-Cd

(A17) P *(*W (b )* > C> /2) # 2exp(-Cdlog n) = 2n

n n j n

for some finite d > 0 and all sufficiently large n. Combining (A15)-(A17) yields

( ( -Cd -2q

(nh ) P *( sup *W (b)* > C> ) # 2(nh ) n O(> ) = o(1)

n n n n n n

b 0 B

as n 6 4 for all sufficiently large C. Q.E.D.

The following lemma gives the bootstrap version of Lemma 2.

Lemma 7: For any ( > 0 and , > 0

(

lim (nh ) P *(2b * - b 2 > ,) = 0.

n n n n

n 6 4

almost surely (P).

Proof: Given any 0 > 0, suppose that *Gn*(b) - Gn(b)* # 0 and *Gn(b) - G(b)* # 0 for all b 0

B. Then since bn* minimizes Gn*, Gn(bn) + 0 $ Gn*(bn) $ Gn*(bn*). Also, Gn*(bn*) $ Gn(bn*) - 0,

so Gn(bn) + 0 $ Gn*(bn*) $ Gn(bn*) - 0, and Gn(bn) - Gn(bn*) $ -20. By a similar argument, G($) -

G(bn) $ -20. Therefore, G($) - G(bn*) = [G($) - G(bn)] + [G(bn) - Gn(bn)] + [Gn(bn) - Gn(bn*)] +

[Gn(bn*) - G(bn*)] $ -60. Because G(b) is continuous on B with a unique minimum at $, it is

possible to choose 0 such that G($) - G(bn*) $ -60 implies 2bn* - $2 # ,/2. By Lemma 2 and the

triangle inequality, 2bn* - $2 # ,/2 implies that 2bn* - bn2 # , for all sufficiently large n almost

surely. Therefore, *Gn*(b) - Gn(b)* # 0 and *Gn(b) - G(b)* # 0 for all b 0 B imply that 2bn* - bn2 #

, for all sufficiently large n almost surely. The lemma follows by combining this result with

Lemmas 1 and 6(c). Q.E.D.

For i,j,k,R = 1,...,q, define Gni*(b) = MGn*(b)/Mbi, Gnij*(b) = M2Gn*(b)/MbiMbj, Gnijk*(b) =

M3Gn*(b)/MbiMbjMbk, GnijkR*(b) = M4Gn*(b)/MbiMbjMbkMbR, Dni*(b) = MDn*(b)/Mbi, Dnij*(b) =

M2Dn*(b)/MbiMbj, and Tni*(b) = MTn*(b)/Mbi. The bootstrap version of Lemma 3 is:

Lemma 8: For all i,j,k,R = 1,..., q, any ( > 0, and all sufficiently large C > 0, lim n 6

4(nhn)(Pn*(An) = 0 almost surely (P), where An is any of:

(a) sup b 0 B *Gni*(b) - EnGni*(b)* > C[(log n)/n1/2]

(b) sup b 0 B *Gnij*(b) - EnGnij*(b)* > C[(log n)/(nhn)1/2]

(c) sup b 0 B *Gnijk*(b) - EnGnijk*(b)* > C[(log n)/(nhn3)1/2]

(d) sup b 0 B *GnijkR*(b) - EnGnijkR*(b)* > C[(log n)/(nhn5)1/2]

(e) sup b 0 B *Dn*(b) - EnDn*(b)* > C[(log n)/(nhn)1/2]

(f) sup b 0 B *Dni*(b) - EnDni*(b)* > C[(log n)/(nhn3)1/2]

(g) sup b 0 B *Dnij*(b) - EnDnij*(b)* > C[(log n)/(nhn5)1/2]

(h) sup b 0 B *Tn*(b) - EnTn*(b)* > C[(log n)/n1/2]

(i) sup b 0 B *Tni(b) - ETni(b)* = o[(log n)/(nhn)1/2],

and (e)-(i) apply to the individual components of the matrices Dn*, Dni*, Dnij*, Tn*, and Tni*. In

addition, for all i,j,k,R = 1,..., q

(j) EnGni*(bn) = 0 with probability 1 - o[(nhn)-(].

(k) EnGnij*(b), EnGnijk*(b), EnGnijkR*(b), EnDn*(b), EnDni*(b), EnDnij*(b), EnTn*(b), and

EnTni*(b) are O(1) almost surely (P) as n 6 4 for all b in a neighborhood of $.

Proof: Parts (a)-(i) are immediate consequences of Lemma 6. Part (j) is the first-order

condition for the bootstrap estimation problem. Part (k) follows from Lemma 3. Q.E.D.

Define SnG* and Sn* as SnG and Sn except with (Yi,Xi) replaced by (Yi*,Xi*) and $ replaced

by bn.

Proposition 2: Let 7 be the function defined in Proposition 1.

1/2

lim sup (nh ){P *(t* # z) - P *[n 7(S *) # z]} = 0

n n n n

n 6 4 z

almost surely (P).

Proof: This is the bootstrap version of Proposition 1. It is proved using the same arguments

that are used to prove Lemmas 4-5 and Proposition 1 but with SnG, Sn, bn, and $, respectively,

replaced by SnG*, Sn*, bn*, and bn. Q.E.D.

b. Step 2: Asymptotic Expansions

For h > 0, let W(u,x,h) be a vector whose components are terms of the form g(x)j(u/h),

where g(x) is the product of (not necessarily distinct) components of x that may be different in

each use of g, and j is the j'th component of the vector defined in assumption 5. The following

lemma gives a modified version of the Cramer condition of Edgeworth analysis.

Lemma 9: Let J be a vector with the same dimension as W. Define RW(J,h) =

E{exp[4J'W(X,U,h)]} where 4 = (-1)1/2. For any , > 0, some C > 0, all J satisfying 2J2 > ,, and all

sufficiently small h

*R (J,h)* < 1 - Ch.

W

Proof: Let r index components of and W. Each component of satisfies *r(v)* = 0 or 1 if

*v* $ 1. Let *r- = r(v) if v # -1 and *r+ = r(v) if v $ 1. Then using the summation convention

!

R (J,h) = # exp[4J g (x) (u/h)]f(u*x)dudP(x)

W " r r r

-h 4

!:! - ! +

= #;# exp[4J g (x)* ]f(u*x)du + # exp[4J g (x)* ]f(u*x)du

"<" r r r " r r r

-4 h

h

! B

+ # exp[4J g (x) (u/h)]f(u*x)duCdP(x)

" r r r D

-h

= A (h) + A (h),

1 2

where

- +

A (h) = E{F(-h*X)exp[4J g (X)* ] + [1 - F(h*X)]exp[4J g (X)* ]},

1 r r r r r r

and

h

!:! B

A (h) = #;# exp[4J g (x) (u/h)]f(u*x)duCdP(x).

2 "<" r r r D

-h

Consider A1(h). *A1(h)* # E*A1(h,X)*, where

- +

A (h,x) = F(-h*X)exp[4J g (X)* ] + [1 - F(h*X)]exp[4J g (X)* ].

1 r r r r r r

Let r = *r+ if *r+ = -*r- = 1. Note that *r+ = *r- otherwise. Therefore,

*A (h,x)* = *F(-h*X)exp[4J g (X) ] + [1 - F(h*X)]exp[-4J g (X) ]*.

1 r r r r r r

2

= {[1 - F(h*x) + F(-h*x)]

2 1/2

- 4[1 - F(h*x]F(-h*x)sin [J g (x) ]}

r r r

# 1 - F(h*x) + F(-h*x)

2 (1) (1)

= 1 - 2hf(0*x) - (1/2)h [f (h *x) - f (h *x)],

1 2

where h1 and h2 are between 0 and h, and the last line is obtained by a Taylor series expansion.

Let Ef(0*X) = C1. By assumption 4(b), C1 > 0 and E*f(1)(h1*X) - f(1)(h2*X)* < M for some finite

M and all sufficiently small h. Therefore,

*A (h)* # E*A (h,X)* # 1 - C h

1 1 1

for all sufficiently small h. Now consider A2(h). By a change of variables

1

!:! B

A (h) = h#;# exp[4J g (x) (.)]f(h.*x)d.CdP(x).

2 "<" r r r D

-1

Given , > 0, choose h sufficiently small that

1 1

! !

# *f(h.*x) - f(0*x)*d.dP(x) # ,# f(0*x)d.dP(x) = 2,C .

" " 1

-1 -1

Then

(A18) *R (J,h)* # 1 - hC (1 - 2,) + *A (J,h)*

W 1 3

for all J, , > 0, and sufficiently small h > 0, where

1

:! B

A (J,h) = h;# exp[4J g (x) (.)]f(0*x)d.CdP(x).

3 <" r r r D

-1

Since gr(x) = 0 for every r only if x = 0 and P(X = 0) < 1, there are 0 > 0 and (1 < 1 such that

!

2# f(0*x)dP(x) = ( C .

" 1 1

2x2 < 0

Suppose, as will be proved presently, that for some C2 < 1,

1

!

(A19) sup * # exp[4J g (x) (.)]d.* = C

" r r r 2

2J2 $ , -1

uniformly over x such that 2x2 $ 0. Then for 2J2 $ ,

(A20) *A (J,h)* # h[( C + (1 - ( )C C ] = h( C ,

3 1 1 1 1 2 2 1

where (2 = [(1 + (1 - (1)C2] < 1. Combining (A18) with (A20) yields

sup *R (J,h)* # 1 - hC (1 - 2, - ( ) / 1 - Ch

W 1 2

2J2 > ,

for all sufficiently small h > 0 and , > 0, thereby establishing the lemma.

It remains to prove (A19). To do this, define t = 2J2. Fix J/2J2 and x with 2x2 . 0. For

the specified J/2J2 and x, and using the summation convention, define f(.) = Jrgr(x)r(.)/2J2. Let -1

= aR <...< aL = 1 be a partition of [-1,1] that satisfies assumption 5c when 2r = gr(x). Then

a

1 L L

! !

R*(J) / # exp[4tf(.)]d. = 3 # exp[4tf(.)]d.

" "

-1 R=2 a

R-1

It suffices to prove that for any , > 0 and some C3 < 1 that does not depend on x or J/2J2

a

R

-1*! *

(A21) sup (a - a ) s# exp[4tf(.)]d.* # C .

R R -1 *" * 3

*t* > , a

R-1

To do this, make the change of variables > = f(.) in (A21) and set v(>) = 1/{df[.(>)]/d.}. Then

a f(a )

R R

! ! 4t>

R**(t) / # exp[4tf(.)]d. = # e v(>)d>.

" "

a f(a )

R-1 R-1

Observe that *R**(t)* # aR - aR - 1, so the right-hand integral is bounded. The right-hand integral

can be approximated arbitrarily accurately by replacing v(C) with a step function. Therefore, it is

enough to prove that

"

2

* ! 4t> *

sup * # e d>* # (" - " )C

* " * 2 1 3

*t* > , "

1

for all "1 < "2 and some C3 < 1 that does not depend on "1 or "2. But

" 2

2 sin [0.5t(" - " )]

* ! 4t> * 2 1

* # e d>* # (" - " ) .

* " * 2 1 2

" [0.5t(" - " )]

1 2 1

The proof is completed by setting C3 = inf *t* > , [(sin2t)/t]. Q.E.D.

Define W* as in Lemma 9 except with $ replaced by bn. Define RW*(J,hn) =

En{exp[4J'W*(U,X,hn)]}. The bootstrap version of Lemma 9 is:

Lemma 10: For any , > 0 and c > 0, some C* > 0, all J satisfying , < 2J2 # nc, and all

sufficiently large n

*R *(J,h )* < 1 - C*h

W n n

almost surely (P).

Proof: Let BnJ = {J: , < 2J2 # nc}. Then

sup *R *(J,h )* # sup *R (J,h )*

W n W n

2J2 0 B 2J2 0 B

nJ nJ

+ sup *R *(J,h ) - R (J,h )*.

W n W n

2J2 0 B

nJ

By arguments similar to those used to prove Lemma 6 together with the Borel-Cantelli lemma,

*RW*(J,hn) - RW(J,hn)* = o(hn) almost surely uniformly over J 0 BnJ. Let C be as in Lemma 9.

Then Lemma 10 follows by letting C* be any number such that C < C* < 1. Q.E.D.

Let Wn1 be a column-vector consisting of the unique components of n1/2[Gni($) - EGni($)] (i

= 1,..., q) and n1/2[Tn($) - ETn($)]. Let Wn2 be a column-vector consisting of the unique

components of (nhn)1/2[Gnij($) - EGnij($)], (nhn3)1/2[Gnijk($) - EGnijk($], (nhn5)1/2[GnijkR($) - EGnijkR($)],

(nhn)1/2[Dn($) - EDn($)], (nhn3)1/2[Dni($) - EDni($)], (nhn5)1/2[Dnij($) - EDnij($)], and (nhn)1/2[Tni($) -

ETni($)] (i,j,k,R = 1,..., q). Set Wn = [Wn1', Wn2']'. Define Wn*, Wn1*, and Wn2* similarly except

with (Yi,Xi) replaced by (Yi*,Xi*) and $ replaced by bn. Order the components of Sn and Sn*

conformably with those of Wn and Wn*. Let Vn be the covariance matrix of [Wn1', Wn2'/hn]' and

Vn* be the covariance matrix of [Wn1*', Wn2*'/hn]' relative to Pn*. Let wn1, wn2, wn1*, and wn2*,

respectively, be the summands of the components of Wn1, Wn2, Wn1*, and Wn2*. These have the

forms gj(X)j(U/h) and gj(X)j(Un/h), where Un = Y - Xbn. For any J = (J1', J2')' conformable with

(wn1', wn2')', define

3

(A22) D (J) = -[4/(6h )]E[(J 'w ) ],

1 n 2 n2

3 2

(A23) D (J) = -(4/6)){E[(J 'w ) ] + (3/h )E[(J 'w )(J 'w ) ]},

2 1 n1 n 1 n1 2 n2

3

(A24) D (J) = -[4/(2h )]E[(J 'w ) ],

3 n 2 n2

and

4 -1 3 2

(A25) D (J) = [1/(24h )]E[(J 'w ) ] + (1/72){h E[(J 'w ) ]} .

4 n 2 n2 n 2 n2

Define Di*(J) (i = 1,..., 4) by replacing wn with wn* and E with En in (A22)-(A25). Let Bi (i =

1,...,4) be the signed measures whose Fourier-Stieljes transforms are

!

(A26) # exp(iJ'>)dB (>) = exp(-0.5J'V J)D (J).

" i n i

Define Bi* (i = 1,...,4) analogously by using Vn* and Di* in place of Vn and Di. Let dW = dim(Wn).

For any set " in dW-dimensional Euclidean space, let M" denote the boundary of " and (M"),

denote the set of all points whose distance from M" does not exceed ,. Let MVn denote probability

measure according to the normal distribution with mean 0 and covariance matrix Vn. Define MVn*

analogously.

Lemma 11: Let A denote a class of Borel sets in dW-dimensional Euclidean space that

satisfy

! 2

sup # exp(-0.52>2 )d> = O(,)

"

" 0 A ,

(M")

as , 6 0+. Then

-1/2 -1/2

sup *P(W 0 ") - M (") - (nh ) B (") - n B (")

n V n 1 2

" 0 A n

1/2 -1 -1

- (h /n) B (") - (nh ) B (")* = o[(nh ) ],

n 3 n 4 n

and almost surely (P)

-1/2 -1/2

sup *P *(W * 0 ") - M (") - (nh ) B *(") - n B *(")

n n V * n 1 2

" 0 A n

1/2 -1 -1

- (h /n) B *(") - (nh ) B *(")* = o[(nh ) ],

n 3 n 4 n

Proof: This is a slightly modified version of Theorem 5.8 of Hall (1992) and is proved

using the same arguments as in Hall's proof after replacing Hall's Lemma 5.6 with Lemmas 9 and

10 above. Q.E.D.

Proof of Theorem 4.1: Only parts (a), (c) and the part of (b) pertaining to q(J,<n) are proved

here. The proofs of the remaining parts are similar. To begin, invert (A26) to obtain

(A27) B (>) = (>)N (>),

i ni V

n

where for each n and i, ni(C) is a multivariate polynomial, and NVn is the multivariate normal

density with mean 0 and covariance matrix Vn. Let Sn(Wn) be the mapping from Wn to Sn. Define

(Wn) = (nhn)1/27[Sn(Wn)]. By Proposition 1 it suffices to consider P( # J). Define a1 = (nhn)-1/2, a2 =

n-1/2, a3 = (hn/n)1/2, and a4 = (nhn)-1. By Lemma 11 and (A27)

4

!

P( # J) = # d[M (>) + 3 a (>)N (>)]

" V i ni V

n i=1 n

{>: (>) # J}

-1

(A28) + o[(nh ) ].

n

uniformly over J. Order the components of > and Wn so that the first components correspond with

(nhn)1/2[Gni($) - EGni($)], where i is the component of $ for which t is the t statistic. Let denote the

vector consisting of all components of > except the first, >1. Change variables in the integral of

(A28) so that the variable of integration is (,')', thereby obtaining

! !

P( # J) = # d# dJ[> (,),]{M [> (,),]

" " 1 V 1

# J n

4

-1

(A29) + 3 a [> (,),]N [> (,),]} + o[(nh ) ]

i ni 1 V 1 n

i=1 n

uniformly over J, where J(C) is the inverse Jacobian term associated with the change of variables.

Taylor series expansions in powers of n-1 of the terms involving >1(,) in (A29) yield

5

-1

P( # J) = M(J) + 3 c (J)N(J) + o[(nh ) ]

ni i n

i=1

-1

(A30) / G (J) + o[(nh ) ]

n n

uniformly over J, where M and N, respectively, are the univariate standard normal distribution and

density functions, the i's are polynomial functions of one variable, cn1 = n-1/2, cn2 = (nhn)-1/2, cn3 = hnn-

1/2, cn4 = (nhn3/2)-1, and cn5 = (nhn)-1. Let R and RG, respectively, denote the characteristic functions

of the distributions of and Gn. Then *R(J) - RG(J)* = o[(nhn)-1]. A Taylor series expansion shows

that in (A30) can be replaced by a multivariate polynomial in components of Sn - E(Sn). The

cumulants through order 4 of this polynomial may be approximated through O[(nhn)-1] using

standard Taylor series methods of kernel estimation. Let knj denote the approximate j'th cumulant.

Expressing R in terms of the approximate cumulants yields R(J) = (J) + o[(nhn)-1] uniformly over

J, where

2 2 3

(J) = [exp(-J /2)]{1 + iJk + (1/2)(iJ) (k - 1) + (1/6)(iJ) k

n1 n2 n3

4 3 2

(A31) + (1/24)(iJ) k + (1/2)[(iJ)k + (1/6)(iJ) k ] }.

n4 n1 n3

Setting RG = , taking the inverse Fourier transform of the result, and setting P(** # J) = P( # J) - P(

# -J) yields (4.1) with

2 2

q(< ,J) = -J[k + (k - 1) + (1/12)(4k k + k )(J - 3)

n n1 n2 n1 n3 n4

2 4 2

+ (1/36)k (J - 10J + 15)].

n3

A straightforward but lengthy calculation shows that kn12, kn1kn3, and kn32 are o[(nhn)-1], whereas kn2

- 1 and kn4 are O[(nhn)-1] and consist of linear combinations of the terms shown in Table I. Q.E.D.

Proof of Theorem 4.2: Under H0, c = R$, so

2 -1

P = (nh )(b - $)'R'(RV R') R(b - $)

n n n n

By arguments similar to those used to prove Propositions 1 and 2 followed by a Taylor series

expansion, there is a multivariate polynomial 7P such that

2 -1

P(P # z) - P[(nh )7 (S ) # z] = o[(nh ) ].

n P n n

uniformly over z and

2

lim sup (nh ){P *(P * # z) - P *[(nh )7 (S *) # z]} = 0

n n n n P n

n 6 4 z

almost surely (P). Set (Wn) = (nhn)7P[Sn(Wn)]. By arguments similar to those used to obtain

(A28),

4

!

P( # J) = # d[M (>) + 3 a (>)N (>)]

" V i ni V

n i=1 n

{>: (>) # z}

-1

+ o[(nh ) ].

n

Now transform to polar coordinates and proceed as in the proof of Theorem 1b of Chandra and

Ghosh (1979). A similar argument applies to P(P2* < z). Q.E.D.

Proof of Theorem 4.3: Only part (a) is proved here. The proof of part (b) is similar. Let t"

and t"*, respectively, denote the exact and bootstrap "-level critical values of the symmetrical t

test. Let kni* denote the bootstrap version of kni (i = 2 or 4). This is obtained from kni by replacing

$ with bn and expected values with sample averages. By Theorem (4.1),

*P(*t* > t *) - "* # sup *P(*t* > J) - P *(*t** > J)*

" n

J

-1

# sup *[q(J,< ) - q(J,< *)]N(J)* + o[(nh )]

n n n

J

= O(k * - k ) + O(k * - k ).

n2 n2 n4 n4

The proof is completed by using methods similar to those used in proving Lemma 3 to show that

the difference between each of the terms in Table I and its bootstrap analog is o[(nhn)-1] almost

surely. Q.E.D.

Proof of Theorem 4.4: The proof consists of repeating each step of the proofs of Lemmas

1-11 and Theorems 4.1-4.3 with Hcn(b) in place of Hn(b) and Assumptions 1' and 3' in place of 1

and 3.

REFERENCES

Beran, R. (1988). Prepivoting test statistics: a bootstrap view of asymptotic refinements, Journal

of the American Statistical Association, 83, 687-697.

Bhattacharya, R.N. and J.K. Ghosh (1978). On the validity of the formal Edgeworth expansion,

Annals of Statistics, 6, 434-451.

Bloomfield, P. and W.L. Steiger (1983). Least Absolute Deviations: Theory, Applications, and

Algorithms, Boston: Birkhauser.

Buchinsky, M. (1995). Estimating the asymptotic covariance matrix for quantile regression

models: a Monte Carlo study, Journal of Econometrics, 68, 303-338.

Bassett, G. and R. Koenker (1978). Asymptotic theory of least absolute error regression, Journal

of the American Statistical Association, 73, 618-621.

De Angelis, D., P. Hall, and G.A. Young (1993). Analytical and bootstrap approximations to

estimator distributions in L1 regressions, Journal of the American Statistical Association,

88, 1310-1316.

Dielman, T. and R. Pfaffenberger (1984). Tests of linear hypotheses and L1 estimation: a Monte

Carlo comparison, American Statistical Association Business and Economic Statistics

Section Proceedings, 644-647.

Dielman, T. and R. Pfaffenberger (1988a). Bootstrapping in least absolute value regression: an

application to hypothesis testing, Communications in Statistics-Simulation and

Computation, 17, 843-856.

Dielman, T. and R. Pfaffenberger (1988b). Least absolute value regression: necessary samle sizes

to use norma theory inference procedures, Decision Sciences, 19, 734-743.

Chandra, T.K. and J.K. Ghosh (1979). Valid asymptotic expansions for the likelihood ratio

statistic and other perturbed chi-square variables, Sankhya, 41, Series A, 22-47.

Hahn, J. (1995). Bootstrapping quantile regression estimators, Econometric Theory, 11, 105-121.

Hall, P. (1986). On the bootstrap and confidence intervals, Annals of Statistics, 1431-1452.

Hall, P. (1992). The Bootstrap and Edgeworth Expansion. New York: Springer-Verlag.

Hall, P. and J.L. Horowitz (1990). Bandwidth selection in semiparametric estimation of censored

linear regression models, Econometric Theory, 6, 123-150.

Horowitz, J.L. (1996). Bootstrap methods in econometrics: theory and numerical performance, in

Advances in Economics and Econometrics: 7th World Congress, D. Kreps and K.W.

Wallis, eds., Cambridge: Cambridge University Press, forthcoming.

Janas (1993). A smoothed bootstrap estimator for a studentized sample quantile, Annals of the

Institute of Statistical Mathematics, 45, 317-329.

Koenker, R. (1982). Robust methods in econometrics, Econometric Reviews, 1, 213-255.

Koenker, R. and G. Bassett (1978). Regression quantiles, Econometrica, 46, 33-50.

Koenker, R. and G. Bassett (1982). Robust tests for heteroscedasticity based on regression

quantiles, Econometrica, 50, 43-61.

Mu..ller, H.-G. (1984). Smooth optimum kernel estimators of densities, regression curves and

modes, Annals of Statistics, 12, 766-774.

Nolan, D. and Pollard, D. (1987). U-processes: rates of convergence, Annals of Statistics, 15,

780-799.

Pakes, A. and Pollard, D. (1989). Simulation and the asymptotics of optimization estimators,

Econometrica, 57, 1027-1057.

Pollard, D. (1984). Convergence of Stochastic Processes, New York: Springer-Verlag.

Powell, J.L. (1984) Least absolute deviations estimation for the censored regression model,

Journal of Econometrics, 25, 303-325.

Powell, J.L. (1986). Censored regression quantiles, Journal of Econometrics, 32, 143-155.

TABLE I: TERMS OF APPROXIMATE CUMULANTS

Notation: gj(x), j an integer, is a product of components of x that may be different in different

occurrences. m1j(x,u) = n-1/2gj(x){[2K(u/hn) - 1] - 2(u/hn)K(1)(u/hn)}. For i = 2 or 3, mij(x,u) = Mi-

1m1j(x,u)/Mui-1. In addition, m4j(x,u) = n-1/2gj(x)K(1)(u/hn), m5j(x,u) = Mm4(x,u)/Mu, :ij = Emij(X,U),

and <ij = [mij(X,U) - :ij].

Cumulant Terms

kn2 - 1 nE(<11<12)E(<13<34), nE(<11<12)E(<13<54),

nE(<11<12)E(<23<24), nE(<11<12)E(<23<44),

nE(<11<12)E(<43<44)

kn4 n2E(<11<12)E(<13<14)E(<15<36), n2E(<11<12)E(<13<14)E(<25<26),

n2E(<11<12)E(<13<14)E(<25<46), n2E(<11<12)E(<13<14)E(<45<46),

n2E(<11<12)E(<13<14)E(<15<56)

FOOTNOTES

. When the t statistic can be approximated by a smooth function of sample

moments, the difference between the true and nominal levels of a

symmetrical t test with bootstrap critical values is typically O(n-2).

With critical values based on first-order asymptotic theory, the

difference is typically O(n-1). See, e.g., Hall (1992). The larger

approximation errors in the case of a t statistic for a median are due to

the median estimator's non-smooth objective function.

. De Angelis, et al. (1993) implement the bootstrap by sampling smoothed

LAD residuals. In contrast to sampling (Y,X) pairs, this method does not

easily generalize to heteroskedastic or censored models.

. K does not satisfy assumption 5b because it has only two derivatives at v

= "1. This problem can be overcome by smoothing K in neighborhoods of v

= "1, but doing so has no effect on the results of the experiments.

. Hall and Horowitz (1990) derived the bandwidth that minimizes the

asymptotic mean-square error of the variance estimator in a homoskedastic

quantile regression. They suggested a plug-in estimator for this

bandwidth. However, the bandwidth that optimizes the variance estimate

is not necessarily optimal for computing test statistics, and little is

known about the numerical performance of the Hall-Horowitz estimator in

testing.