%Paper: ewp-em/9406004
%From: STolande@scout-po.biz.uiowa.edu
%Date: 24 Jun 94 10:29 CST
%Date (revised): 24 Jun 94 11:27 CST


                               ESTIMATION OF
                      EQUILIBRIUM WAGE DISTRIBUTIONS
                            WITH HETEROGENEITY


                             AUDRA  J. BOWLUS
                            NICHOLAS M. KIEFER
                            GEORGE R. NEUMANN 

                                June, 1994                             1.
Intro
duction.

One-sided wage search models have become the empirical workhorse of
labor economics
in the past decade.  Devine and Kiefer (1991) survey over 600
articles and books which
use the simple search model of Mortensen (1970) and McCall (1970)
as the framework
for discussing empirical work.  Despite its attractive features for
empirical work on
durations, there is an obvious limitation in that the form of the
wage offer distribution is
taken as given and consequently nothing can be said about why wage
distributions are
what they are. This is a shortcoming for any investigation of wage
policy, for example,
studying whether the extent of competition varies across markets. 


Equilibrium  search models, e.g., Albrecht and Axell, (1984),
Burdett (1990), Burdett and
Mortensen (1989), Mortensen, (1990),  provide a structure where
wage and duration data
can be interpreted as a general equilibrium outcome dependent upon
an underlying
matching technology.  The essential idea of equilibrium search
models is that wage policy
matters in the following way:  High wage firms attract labor
easily, and other things
equal retain workers longer.  Thus wage policy matters as a method
of balancing supply
and demand.  These ideas about wage policy are old ideas, yet
little formal content has
been given to them until recently.  Estimation of such models is in
its infancy, and the
appropriate approach to specification and estimation is an area of
much current
research.  Initial efforts at fitting such models (Eckstein and
Wolpin(1990), van den Berg
and Ridder (1993,1994), and Kiefer and Neumann (1991,1994)) have
not been
completely successful.  Essentially, the tight theoretical
structure needed to generate
simple estimation strategies results in a very poor match between
theory and evidence. 
One explanation for this mismatch between theory and evidence is
that in this area, as in
others in economics, unmeasured differences across workers and jobs
cannot be ignored. 
In the first section of this paper we provide a quantitative
assessment of the amount of
wage variation that exists.  We show that between 20 to 50% of the
variation in weekly
earnings cannot be explained by standard measures of jobs and
workers.  This suggests
that search considerations potentially have a large role to play in
explaining the pattern
of wages.

A goal of this paper is to describe an approach to estimation of
equilibrium search
models in the presence of heterogeneity.  We show that this leads
to a non-standard
inference problem, and we provide an estimator appropriate for this
case.  The finite
sample performance of this estimator is examined via Monte Carlo
methods, and in the
penultimate section of the paper we employ these techniques to
analyze the labor market
history of a sample of the NLS Youth data used previously by
Eckstein and Wolpin and
by Kiefer and Neumann.  We show that the fundamental parameters are
identified from
wage data, and that the choice of heterogeneity can be made to fit
the wage data well.                              2. Wage Variations.
Wages vary across individuals for a variety of reasons: differences
in productivity,
differences in taste, and differences in luck.  In one view, all
wages differences are
compensatory, making up for some difference in the worker or the
job.  Thus skilled
workers must be paid more in order to compensate for the costs of
acquiring skills;
similarly, fireman and policeman are usually paid more because of
the risks that are
faced in these occupations.

Search theory views differences in wages as the outcome of a wage
posting - employee
search process where firms that pay higher wages obtain a larger
labor force that turns
over less rapidly.  In the simplest version of an equilibrium
search model (Mortensen
(1990)) obtaining greater labor supply is the reason a firm pays
higher wages;
idiosyncratic differences in firm-specific training would produce
a similar pattern. 

To quantify how large these variations in wages are we looked at
the cross-section
distribution of weekly earnings, obtained from the outgoing
rotation groups of the
Current Population Survey (CPS) samples in 1990.  To standardize
the earnings data we
restricted attention to private sector workers, aged 18-65, engaged
in full-time work. 
There were 74,639 males and 51,960 females in the sample.  Row 1 of
Table 1 shows the
standard deviation of log weekly earnings for the raw data.  As is
usually found in
earnings data, male earnings on average are larger and more
dispersed than female
earnings; for our purposes, the standard deviation is the unit that
needs accounting.  We
do this by fitting regressions that include greater amounts of
controls.

                               Table 1
             Standard Deviation of log (Weekly Earnings)
                  1990 CPS Outgoing Rotation GroupsGroup/
ControlsMalesFemale
sAll Full-time, 18-65,
 Private sector                    0.6127             0.5291 /age, educ.,
rac
e, union
member, un. cov.                   0.4931             0.4589/ " + city
size0
.48510.4438/ " + industry0.47340.4318/ " + state
indicators0.46900.4269/
 " + occupation0.43240.3860
Our decomposition of the variance is not intended to be structural. 
Rather, it is a simple
accounting convention to show how much variance in wages remains
unexplained after
accounting for various factors thought to influence earnings.  The
second row of Table 1
shows the standard deviation of the residuals obtained from a
regressions that included
the usual set of "human capital"-type variables: age, age2, years
of education, race,
whether the individual is a union member or is covered by a union
contract.  These
factors account for about 35% of the variance in male earnings [1
-((0.4931)/(0.6127))2 
= 35% ]  and about 25% of the variance of female earnings. 
Successive rows in the
table add controls for city size, one-digit industry (9), state of
residence (50), and
occupation (45).  The final regression run is the kitchen-sink
prototype:  there are 110
variables included. 

Of course nothing prevents a more aggressive search for structure: 
a more detailed
industry description with over 500 sub-categories is available in
the CPS, as is a more
detailed occupation code.  However, the pattern in Table 1 is
clear:  the unexplained
variance in wages remains substantial even as we add more and more
measures specific
to jobs.  Thus, in row 6 the collection of usual and unusual
suspects that we have
employed accounts for about 50.2% of male earnings (1
-(.4324/.6127)2 = 50.2%) and
about 46.8% of female earnings.  In round numbers, observable
differences in workers
explain about half of the variation in wages.

It might be argued that the variability shown in Table I simply
reflects variability in job
characteristics that are not included in the regressions.  If we
were to compare similar
jobs, this variability would disappear.  The BLS attempts to do
exactly this in its Area
Wage Survey publication.  Here the sampling unit is the firm, and
only specific
standardized jobs are included.  In Table 2 we have summarized the
standard deviation
of weekly earnings for Secretaries, Guards, and Janitors in the St.
Louis, Missouri CMSA
in February of 1993, and for comparison purposes, March of 1984. 
We chose these
occupations because they are believed to be very standardized; we
chose the St. Louis
CMSA because it was the most current area wage survey available
from the BLS. 
These data report means, medians, 25th and 75th percentiles of the
cross-section wage
distribution in a specific locale for a number of narrowly defined
job descriptions.  The
data are also reported in grouped data form, so that, for example,
we know how many
secretary I's earned between $400 and $450 per week, how many
earned between $450
and $500, and so forth.  In contrast to CPS data, these data are
taken from firm records
and refer to straight-time earnings for full-time jobs.

Table 2 contains estimates of the standard deviations of these data
computed from the
grouped data. The jobs covered in the table are Secretary I to
Secretary V, Guard I -II,
and Janitor.  What Table 2 shows is that wage variation among
"identical" persons is
substantial.  Even within narrowly defined groups --Secretary I -
V, and Guard I-II-- the
variation in weekly earnings remains large.  Using Table 1 as a
base would imply that
unexplained variation in wages ranged from 12% for secretary I's to
21% for janitor's. 
We also note that variation in earnings does not appear to diminish
significantly as we
look across higher levels of secretaries, or guards.  In other
words, equalizing effects that
might  be expected from firm-specific investments does not seem to
be that important, at
least in these data.

 
                               Table 2
             Standard Deviation of Log(Weekly Earnings)
            1984 & 1993 Area Wage Survey - St. Louis, MOGroup/
Controls199319
84 Secretary I0.18220.2291 Secretary II0.18910.1969 Secretary
III0.1787
0.2426 Secretary IV0.19120.2528 Secretary V0.15720.2229 Guard
I0.25780.
3340 Guard II0.24410.4609 Janitor0.27910.4460
The results for 1984 show that the within occupation variation in
wages is a persistent
feature of this labor market.  Curiously, the variation in wages
was even larger in 1984
than in 1993, a result that runs counter to recent evidence for the
entire labor market.

We conclude from this examination of wage variation that the
unexplained component of
wages is substantial, arguably 12 to 21 % of wage variation, and it
is therefore worthy of
study.
                3.  Empirical Models of Equilibrium Search.
                      A.  Review of Previous Research
There have been three approaches to analysis of equilibrium search
models.  One is the
equilibrium search model of Albrecht and Axell (1984).  In that
model workers differ in
their reservation wage and firms compete to find "bargains" in the
labor market.  With M
types of workers, the Nash equilibrium of the model has firms
offering one of M wages,
where each wage offered is the reservation wage of some worker
type.  If firms are all
one type, equilibrium is characterized by equal profits across all
firms; if there are
multiple types of firms, more productive firms offer higher wages
in order to obtain more
labor.  Eckstein and Wolpin (1990) implement this model assuming
that M is small but
that "measurement error" in recorded wages exists, and that it is
this measurement error
that confounds the sharp predictions of the model.  One difficulty
with this approach is
that all variation in outcomes --search durations, wage
distributions -- is attributed to
measurement error in wages or to unmeasured heterogeneity in
unobservable reservation
wages.

In an alternative approach Ridder and van den Berg (1993, 1994) 
estimate variants of
Mortensen's (1990) model.  In their 1993 paper they consider using
data on both job
length and wages received by a panel of Dutch workers, and in their
1994 paper they
consider the information that can be obtained from wages alone. 
Using data on
observationally different individuals ( different age, education,
etc.) and allowing the
fundamental parameters to vary in a regression-like manner, i.e.,
they specify ~0i =
exp(`'xi), ~1i = exp(~'xi), ki = exp(~'xi), where ~0 is the arrival
rate of offers while
unemployed, ~1 is the arrival rate of offers while employed, and k
is the rate of job
destruction, and i indexes individuals.  A term for "measurement
error" in wages,
assumed to be log normally distributed, is also included.  In this
approach there is only
measured heterogeneity:  all firms are identical, and, up to a
regression line, so too are
all workers.  Measurement error in wages is crucial for their
specification and estimation
technique.

Kiefer and Neumann (1991, 1994) estimate the homogeneous version of
Mortensen's
(1990) model.  They partition individual employment histories into
categories defined by
sex (2), completed education level (6), and race/ethnicity (3) and
study the wage and
durations of youths on their first job after completing formal
schooling. This grouping
strategy thus allows a fully nonparametric control for a limited
set of characteristics.  The
methods that Kiefer-Neumann propose make use of order statistics as
estimators for
reservation wages and (partially) for productivity.  Kiefer and
Neumann (1991) show via
Monte Carlo methods the properties of these estimators, both with
and without
measurement error of the classical sort. 

Unfortunately, all three approaches are unsatisfactory in that they
do not "fit" the data,
particularly the wage data.  Evidently, there are important sources
of heterogeneity that
are not adequately treated in the approaches that have been used. 
However, the theory
of equilibrium search is particularly rich in implications about
the effects of productivity
and supply differences on equilibrium prices and durations.  In the
following sections we
describe exactly what these implications are and what they imply
for estimation.

     B.    Estimation of the Homogeneous Model
The homogeneous version of the equilibrium search model is set out
in Mortensen
(1990) and summarized in Kiefer and Neumann (1994).  To fix ideas
we summarize it
here.  Workers have a reservation wage R, which solves the usual
search problem for
wealth maximization.  Unemployed workers see jobs arrive at rate
{}~0, and they accept
the first job that offers more than their reservation wage.  While
employed at wage W a
worker's reservation wage is also W.  Job offers arrive at a rate
{}~1 while employed and
jobs "disappear" at the rate k.  Firms are identical with
productivity level P, face constant
returns to scale in production, and maximize profits by choosing
the wage to pay.  The
balancing condition which equates supply and demand is that firms
will offer higher
wages if and only if they can expect to get an additional number of
workers to cover the
lower per worker profits.  Higher wages attract more workers to a
firm and allows firms
to retain the workers longer.  The unique equilibrium wage
distribution implied by this
process of wage and employment determination is:

with density

where ~1 = ~1/k.
This framework can be used to completely characterize labor market
histories.  To
identify the parameters of interest individual panel data must
contain information on a
spell of unemployment, the wage received on the job found, and the
length of time spent
on that job. Information on why the job was left -- whether because
the job disappeared
or because the worker quit -- aids identification.  This part of a
labor market history is
characterized by
D1 = duration of unemployed search ~ ~0 exp(-~0 D1) 
 (3 a)
w1 = wage received on first job ~f(w1) 
 (3 b)
D2~w1 = duration of job, conditional on earning w1 
 
     ~(k + ~1[1-F(w1)]) exp(-(k + ~1[1-F(w1)])D2 ) 
 (3 c)
C= 1(job is lost)
Pr( job is lost~w1) = k/[k + ~1[1-F(w1)] 
 (3 d)
The likelihood function is the product of these four terms:
{}~(~)   =  ~(~0,  ~1, k, P, R)
     = ~0 exp(-~0 D1) f(w1) exp(-(k + ~1[1-F(w1)]D2))kC
(~1[1-F(w1)])1-C(4)
Kiefer and Neumann (1994) propose the estimators
        = min { wi },   = max { wi } 
 (5 a)
        = [ (1+~)/ ~]   - [ 1/ ~] 
 (5 b)
     ~    = ( 1 + ~1/k )2  - 1 > 0 
 (5 c)
The estimators  and   are super-efficient estimators and the theory
of local cuts
(Christensen and Kiefer, 1994) justifies conditioning on these
estimates.  Denoting the
minimum of a sample of N observations from a sample of a r.v.
distributed ~ G as w1:N
and the maximum as wN:N, their joint density is given by:
f(w1:N, wN:N ) = (N/2)*(N-1)*[ G(wN) - G(w1)]N-2*g(wN)g(w1) 
   (6)

The joint density of the observations can be built up from the
joint distribution of these
estimators of the sample extremes, which serve as estimators of the
reservation wage and
the unknown productivity, P.  However, because these estimators
converge at rate N,
asymptotic inference about ~ = (~0, ~1, k) can proceed from the
profile likelihood with 
and  substituted in for the true values.  Asymptotically, to order
N1/2, ignoring the
variability in estimates of R and P is unimportant;  Kiefer and
Neumann (1991) show
that the bias is ignorable for samples sizes over 200.  As Ridder
and van den Berg (1994)
note, these estimators are sensitive to measurement error, but at
least for the case of
classical measurement error in small samples their performance
actually is improved.

                 C. Estimation of the Heterogeneous Model.
Adding heterogeneity to this model poses difficulties for
estimation.  Heterogeneity can
be from either the supply side -- variations in R across workers --
or from the demand
side --variations in P across firms.  These have different
implications.   As Mortensen
(1990) shows, differences in R across workers leaves "flats" in the
distribution function
and regions of zero density in the wage pdf, while variations in P
across firms leaves
"kinks" in the distribution function and points of discontinuity in
the pdf.  Figure 1 shows
the distribution of wages for high school graduates from the NLS
Youth data.  The
topmost curve in the figure is the empirical cdf of weekly wages. 
The bottommost curve
in the figure is the predicted CDF of wages using the estimates
from Kiefer-Neumann
(1994).  Evidently, the empirical wage distribution is not very
close to that model's
prediction.  It is worth noting that the fitted wage cdf must have
the curvature that the
bottommost line shows in figure 1; it must slope upwards to the
right and be convex. 
Note that the empirical CDF shows no obvious signs of having
"flats", which suggests that
unobserved worker heterogeneity may be of small importance.  The
two interior curves
are based on heterogeneous models.  each segment retains the convex
shape;  the first
curve uses two types of firms while the second uses three types. 
It is somewhat
surprising to see how big an effect "small" adjustments can have. 
In contrast, figure 2
shows that adding worker heterogeneity is unlikely to be helpful in
matching the
observed wage distribution because it forces the fitted wage
distribution to be even more
convex.

We focus on estimation of the wage distribution rather than on
estimating the full model
described in (3)a - d because the new econometric issues arise
here.  We consider the