%Paper: ewp-em/9406004 %From: STolande@scout-po.biz.uiowa.edu %Date: 24 Jun 94 10:29 CST %Date (revised): 24 Jun 94 11:27 CST ESTIMATION OF EQUILIBRIUM WAGE DISTRIBUTIONS WITH HETEROGENEITY AUDRA J. BOWLUS NICHOLAS M. KIEFER GEORGE R. NEUMANN June, 1994 1. Intro duction. One-sided wage search models have become the empirical workhorse of labor economics in the past decade. Devine and Kiefer (1991) survey over 600 articles and books which use the simple search model of Mortensen (1970) and McCall (1970) as the framework for discussing empirical work. Despite its attractive features for empirical work on durations, there is an obvious limitation in that the form of the wage offer distribution is taken as given and consequently nothing can be said about why wage distributions are what they are. This is a shortcoming for any investigation of wage policy, for example, studying whether the extent of competition varies across markets. Equilibrium search models, e.g., Albrecht and Axell, (1984), Burdett (1990), Burdett and Mortensen (1989), Mortensen, (1990), provide a structure where wage and duration data can be interpreted as a general equilibrium outcome dependent upon an underlying matching technology. The essential idea of equilibrium search models is that wage policy matters in the following way: High wage firms attract labor easily, and other things equal retain workers longer. Thus wage policy matters as a method of balancing supply and demand. These ideas about wage policy are old ideas, yet little formal content has been given to them until recently. Estimation of such models is in its infancy, and the appropriate approach to specification and estimation is an area of much current research. Initial efforts at fitting such models (Eckstein and Wolpin(1990), van den Berg and Ridder (1993,1994), and Kiefer and Neumann (1991,1994)) have not been completely successful. Essentially, the tight theoretical structure needed to generate simple estimation strategies results in a very poor match between theory and evidence. One explanation for this mismatch between theory and evidence is that in this area, as in others in economics, unmeasured differences across workers and jobs cannot be ignored. In the first section of this paper we provide a quantitative assessment of the amount of wage variation that exists. We show that between 20 to 50% of the variation in weekly earnings cannot be explained by standard measures of jobs and workers. This suggests that search considerations potentially have a large role to play in explaining the pattern of wages. A goal of this paper is to describe an approach to estimation of equilibrium search models in the presence of heterogeneity. We show that this leads to a non-standard inference problem, and we provide an estimator appropriate for this case. The finite sample performance of this estimator is examined via Monte Carlo methods, and in the penultimate section of the paper we employ these techniques to analyze the labor market history of a sample of the NLS Youth data used previously by Eckstein and Wolpin and by Kiefer and Neumann. We show that the fundamental parameters are identified from wage data, and that the choice of heterogeneity can be made to fit the wage data well. 2. Wage Variations. Wages vary across individuals for a variety of reasons: differences in productivity, differences in taste, and differences in luck. In one view, all wages differences are compensatory, making up for some difference in the worker or the job. Thus skilled workers must be paid more in order to compensate for the costs of acquiring skills; similarly, fireman and policeman are usually paid more because of the risks that are faced in these occupations. Search theory views differences in wages as the outcome of a wage posting - employee search process where firms that pay higher wages obtain a larger labor force that turns over less rapidly. In the simplest version of an equilibrium search model (Mortensen (1990)) obtaining greater labor supply is the reason a firm pays higher wages; idiosyncratic differences in firm-specific training would produce a similar pattern. To quantify how large these variations in wages are we looked at the cross-section distribution of weekly earnings, obtained from the outgoing rotation groups of the Current Population Survey (CPS) samples in 1990. To standardize the earnings data we restricted attention to private sector workers, aged 18-65, engaged in full-time work. There were 74,639 males and 51,960 females in the sample. Row 1 of Table 1 shows the standard deviation of log weekly earnings for the raw data. As is usually found in earnings data, male earnings on average are larger and more dispersed than female earnings; for our purposes, the standard deviation is the unit that needs accounting. We do this by fitting regressions that include greater amounts of controls. Table 1 Standard Deviation of log (Weekly Earnings) 1990 CPS Outgoing Rotation Groups Group/ Controls Males Female s All Full-time, 18-65, Private sector 0.6127 0.5291 /age, educ., rac e, union member, un. cov. 0.4931 0.4589 / " + city size 0 .4851 0.4438 / " + industry 0.4734 0.4318 / " + state indicators 0.4690 0.4269 / " + occupation 0.4324 0.3860 Our decomposition of the variance is not intended to be structural. Rather, it is a simple accounting convention to show how much variance in wages remains unexplained after accounting for various factors thought to influence earnings. The second row of Table 1 shows the standard deviation of the residuals obtained from a regressions that included the usual set of "human capital"-type variables: age, age2, years of education, race, whether the individual is a union member or is covered by a union contract. These factors account for about 35% of the variance in male earnings [1 -((0.4931)/(0.6127))2 = 35% ] and about 25% of the variance of female earnings. Successive rows in the table add controls for city size, one-digit industry (9), state of residence (50), and occupation (45). The final regression run is the kitchen-sink prototype: there are 110 variables included. Of course nothing prevents a more aggressive search for structure: a more detailed industry description with over 500 sub-categories is available in the CPS, as is a more detailed occupation code. However, the pattern in Table 1 is clear: the unexplained variance in wages remains substantial even as we add more and more measures specific to jobs. Thus, in row 6 the collection of usual and unusual suspects that we have employed accounts for about 50.2% of male earnings (1 -(.4324/.6127)2 = 50.2%) and about 46.8% of female earnings. In round numbers, observable differences in workers explain about half of the variation in wages. It might be argued that the variability shown in Table I simply reflects variability in job characteristics that are not included in the regressions. If we were to compare similar jobs, this variability would disappear. The BLS attempts to do exactly this in its Area Wage Survey publication. Here the sampling unit is the firm, and only specific standardized jobs are included. In Table 2 we have summarized the standard deviation of weekly earnings for Secretaries, Guards, and Janitors in the St. Louis, Missouri CMSA in February of 1993, and for comparison purposes, March of 1984. We chose these occupations because they are believed to be very standardized; we chose the St. Louis CMSA because it was the most current area wage survey available from the BLS. These data report means, medians, 25th and 75th percentiles of the cross-section wage distribution in a specific locale for a number of narrowly defined job descriptions. The data are also reported in grouped data form, so that, for example, we know how many secretary I's earned between $400 and $450 per week, how many earned between $450 and $500, and so forth. In contrast to CPS data, these data are taken from firm records and refer to straight-time earnings for full-time jobs. Table 2 contains estimates of the standard deviations of these data computed from the grouped data. The jobs covered in the table are Secretary I to Secretary V, Guard I -II, and Janitor. What Table 2 shows is that wage variation among "identical" persons is substantial. Even within narrowly defined groups --Secretary I - V, and Guard I-II-- the variation in weekly earnings remains large. Using Table 1 as a base would imply that unexplained variation in wages ranged from 12% for secretary I's to 21% for janitor's. We also note that variation in earnings does not appear to diminish significantly as we look across higher levels of secretaries, or guards. In other words, equalizing effects that might be expected from firm-specific investments does not seem to be that important, at least in these data. Table 2 Standard Deviation of Log(Weekly Earnings) 1984 & 1993 Area Wage Survey - St. Louis, MO Group/ Controls 1993 19 84 Secretary I 0.1822 0.2291 Secretary II 0.1891 0.1969 Secretary III 0.1787 0.2426 Secretary IV 0.1912 0.2528 Secretary V 0.1572 0.2229 Guard I 0.2578 0. 3340 Guard II 0.2441 0.4609 Janitor 0.2791 0.4460 The results for 1984 show that the within occupation variation in wages is a persistent feature of this labor market. Curiously, the variation in wages was even larger in 1984 than in 1993, a result that runs counter to recent evidence for the entire labor market. We conclude from this examination of wage variation that the unexplained component of wages is substantial, arguably 12 to 21 % of wage variation, and it is therefore worthy of study. 3. Empirical Models of Equilibrium Search. A. Review of Previous Research There have been three approaches to analysis of equilibrium search models. One is the equilibrium search model of Albrecht and Axell (1984). In that model workers differ in their reservation wage and firms compete to find "bargains" in the labor market. With M types of workers, the Nash equilibrium of the model has firms offering one of M wages, where each wage offered is the reservation wage of some worker type. If firms are all one type, equilibrium is characterized by equal profits across all firms; if there are multiple types of firms, more productive firms offer higher wages in order to obtain more labor. Eckstein and Wolpin (1990) implement this model assuming that M is small but that "measurement error" in recorded wages exists, and that it is this measurement error that confounds the sharp predictions of the model. One difficulty with this approach is that all variation in outcomes --search durations, wage distributions -- is attributed to measurement error in wages or to unmeasured heterogeneity in unobservable reservation wages. In an alternative approach Ridder and van den Berg (1993, 1994) estimate variants of Mortensen's (1990) model. In their 1993 paper they consider using data on both job length and wages received by a panel of Dutch workers, and in their 1994 paper they consider the information that can be obtained from wages alone. Using data on observationally different individuals ( different age, education, etc.) and allowing the fundamental parameters to vary in a regression-like manner, i.e., they specify ~0i = exp(`'xi), ~1i = exp(~'xi), ki = exp(~'xi), where ~0 is the arrival rate of offers while unemployed, ~1 is the arrival rate of offers while employed, and k is the rate of job destruction, and i indexes individuals. A term for "measurement error" in wages, assumed to be log normally distributed, is also included. In this approach there is only measured heterogeneity: all firms are identical, and, up to a regression line, so too are all workers. Measurement error in wages is crucial for their specification and estimation technique. Kiefer and Neumann (1991, 1994) estimate the homogeneous version of Mortensen's (1990) model. They partition individual employment histories into categories defined by sex (2), completed education level (6), and race/ethnicity (3) and study the wage and durations of youths on their first job after completing formal schooling. This grouping strategy thus allows a fully nonparametric control for a limited set of characteristics. The methods that Kiefer-Neumann propose make use of order statistics as estimators for reservation wages and (partially) for productivity. Kiefer and Neumann (1991) show via Monte Carlo methods the properties of these estimators, both with and without measurement error of the classical sort. Unfortunately, all three approaches are unsatisfactory in that they do not "fit" the data, particularly the wage data. Evidently, there are important sources of heterogeneity that are not adequately treated in the approaches that have been used. However, the theory of equilibrium search is particularly rich in implications about the effects of productivity and supply differences on equilibrium prices and durations. In the following sections we describe exactly what these implications are and what they imply for estimation. B. Estimation of the Homogeneous Model The homogeneous version of the equilibrium search model is set out in Mortensen (1990) and summarized in Kiefer and Neumann (1994). To fix ideas we summarize it here. Workers have a reservation wage R, which solves the usual search problem for wealth maximization. Unemployed workers see jobs arrive at rate {}~0, and they accept the first job that offers more than their reservation wage. While employed at wage W a worker's reservation wage is also W. Job offers arrive at a rate {}~1 while employed and jobs "disappear" at the rate k. Firms are identical with productivity level P, face constant returns to scale in production, and maximize profits by choosing the wage to pay. The balancing condition which equates supply and demand is that firms will offer higher wages if and only if they can expect to get an additional number of workers to cover the lower per worker profits. Higher wages attract more workers to a firm and allows firms to retain the workers longer. The unique equilibrium wage distribution implied by this process of wage and employment determination is: with density where ~1 = ~1/k. This framework can be used to completely characterize labor market histories. To identify the parameters of interest individual panel data must contain information on a spell of unemployment, the wage received on the job found, and the length of time spent on that job. Information on why the job was left -- whether because the job disappeared or because the worker quit -- aids identification. This part of a labor market history is characterized by D1 = duration of unemployed search ~ ~0 exp(-~0 D1) (3 a) w1 = wage received on first job ~f(w1) (3 b) D2~w1 = duration of job, conditional on earning w1 ~(k + ~1[1-F(w1)]) exp(-(k + ~1[1-F(w1)])D2 ) (3 c) C= 1(job is lost) Pr( job is lost~w1) = k/[k + ~1[1-F(w1)] (3 d) The likelihood function is the product of these four terms: {}~(~) = ~(~0, ~1, k, P, R) = ~0 exp(-~0 D1) f(w1) exp(-(k + ~1[1-F(w1)]D2))kC (~1[1-F(w1)])1-C(4) Kiefer and Neumann (1994) propose the estimators = min { wi }, = max { wi } (5 a) = [ (1+~)/ ~] - [ 1/ ~] (5 b) ~ = ( 1 + ~1/k )2 - 1 > 0 (5 c) The estimators and are super-efficient estimators and the theory of local cuts (Christensen and Kiefer, 1994) justifies conditioning on these estimates. Denoting the minimum of a sample of N observations from a sample of a r.v. distributed ~ G as w1:N and the maximum as wN:N, their joint density is given by: f(w1:N, wN:N ) = (N/2)*(N-1)*[ G(wN) - G(w1)]N-2*g(wN)g(w1) (6) The joint density of the observations can be built up from the joint distribution of these estimators of the sample extremes, which serve as estimators of the reservation wage and the unknown productivity, P. However, because these estimators converge at rate N, asymptotic inference about ~ = (~0, ~1, k) can proceed from the profile likelihood with and substituted in for the true values. Asymptotically, to order N1/2, ignoring the variability in estimates of R and P is unimportant; Kiefer and Neumann (1991) show that the bias is ignorable for samples sizes over 200. As Ridder and van den Berg (1994) note, these estimators are sensitive to measurement error, but at least for the case of classical measurement error in small samples their performance actually is improved. C. Estimation of the Heterogeneous Model. Adding heterogeneity to this model poses difficulties for estimation. Heterogeneity can be from either the supply side -- variations in R across workers -- or from the demand side --variations in P across firms. These have different implications. As Mortensen (1990) shows, differences in R across workers leaves "flats" in the distribution function and regions of zero density in the wage pdf, while variations in P across firms leaves "kinks" in the distribution function and points of discontinuity in the pdf. Figure 1 shows the distribution of wages for high school graduates from the NLS Youth data. The topmost curve in the figure is the empirical cdf of weekly wages. The bottommost curve in the figure is the predicted CDF of wages using the estimates from Kiefer-Neumann (1994). Evidently, the empirical wage distribution is not very close to that model's prediction. It is worth noting that the fitted wage cdf must have the curvature that the bottommost line shows in figure 1; it must slope upwards to the right and be convex. Note that the empirical CDF shows no obvious signs of having "flats", which suggests that unobserved worker heterogeneity may be of small importance. The two interior curves are based on heterogeneous models. each segment retains the convex shape; the first curve uses two types of firms while the second uses three types. It is somewhat surprising to see how big an effect "small" adjustments can have. In contrast, figure 2 shows that adding worker heterogeneity is unlikely to be helpful in matching the observed wage distribution because it forces the fitted wage distribution to be even more convex. We focus on estimation of the wage distribution rather than on estimating the full model described in (3)a - d because the new econometric issues arise here. We consider the