*Bayard is a Ph.D. student in economics at the University of Maryland. Hellerstein is Assistant Professor of Economics at the University of Maryland, and a Faculty Research Fellow of the NBER. Neumark is Professor of Economics at Michigan State University, and a Research Associate of the NBER. Troske is Assistant Professor of Economics at the University of Missouri. This research was supported by NSF grant SBR95-10876 through the NBER. The research in this paper was conducted while the authors were research associates with the Center for Economic Studies, U.S. Bureau of the Census. Research results and conclusions expressed are those of the authors and do not necessarily indicate concurrence by the Bureau of the Census or the Center for Economic Studies.
Abstract
We examine the possible sources of the larger racial and ethnic wage gaps for men than for women in the U.S. Specifically, using a newly created employer-employee matched data set containing workers in essentially all occupations, industries, and regions, we examine whether these wage differences can be accounted for by differences between men and women in the patterns of racial and ethnic segregation within occupation, industry, establishments and occupation-establishment cells. To the best of our knowledge, this is the first paper to examine segregation by race and ethnicity at the level of establishment and job cell. Our results indicate that greater segregation between Hispanic men and white men than between Hispanic women and white women accounts for essentially all of the higher Hispanic-white wage gap for men. In addition, our estimates indicate that greater segregation between black and white men than between black and white women accounts for a sizable share (one-third to one-half) of the higher black-white wage gap for men. Our results imply that segregation is an important contributor to the lower wages paid to black and Hispanic men than to white men with similar individual characteristics. Our results also suggest that equal pay types of laws may offer some scope for reducing the black-white wage differential for men, but little scope for reducing the Hispanic-white wage differential for men.
I. Introduction
Labor economists have long been occupied with explorations of the sources of wage differences by sex, race, and ethnicity. It is well known that wages earned by minorities and by females fall short of wages earned by white males, after accounting for differences in standard human capital proxies and other variables for which measures are readily available in many micro-level data sets (schooling, age or experience, marital status, urban residence, region, etc.).
Aside from this general fact, an additional fact about racial and ethnic wage gaps is that they are considerably larger for men than for women. This is true in the raw data, as well as once we account for numerous determinants of wages or earnings. For example, based on 1981 CPS data, Cain (1986, Table 13.4) reports that for all workers, black-white earnings ratios are 0.67 for men vs. 0.97 for women, while Hispanic-white earnings ratios are 0.72 for men and 0.90 for women. For full-time, year-round workers, black-white earnings ratios are 0.69 for men vs. 0.90 for women, while Hispanic-white earnings ratios are 0.72 for men and 0.87 for women.(1) As a second example, as we report later in this paper, in log wage regressions including controls for schooling, age, etc., based on the 1990 Census of Population, the estimated black-white (actually, black vs. non-black, non-Hispanic) earnings differential is -0.121 for men vs. -0.022 for women, while the Hispanic-white differential is -0.115 for men vs. -0.045 for women. Finally, in a cross-section of 1990 and 1991 observations from the NLSY, in log wage regressions with no controls Neal and Johnson (1996) report that black men earn 24.4 percent less than white men vs. an 18.5 percent shortfall for black women, while Hispanic men earn 11.3 percent less than white men vs. a 2.8 percent (and insignificant) shortfall for Hispanic women.
When Neal and Johnson control for AFQT (interpreted as a catch-all for pre-market factors affecting wages), the black-white difference for men falls to -7.2 percent, while black women are estimated to earn 3.5 percent more than white women (an insignificant difference).(2) Thus, even if one believes the Neal and Johnson claim that pre-market factors account for a sizable fraction of racial and ethnic wage differences, the fact that the difference in the black-white wage gap between men and women persists suggests that this difference is a "labor market" rather than a "pre-market" phenomenon.
In our view the larger racial and ethnic wage gaps for men than for women are a rather striking set of stylized facts that have largely been ignored by researchers attempting to understand the sources of racial and ethnic wage differences. In this paper we examine more closely the possible sources of the differences in the wage gap, paying particular attention to whether these differences can be accounted for by differences between men and women in the patterns of racial and ethnic segregation.(3) More generally, we believe that research on why racial and ethnic wage gaps differ by sex may ultimately prove useful in helping to understand the sources of these gaps. For example, if one believes that the observed wage differentials are the result of employer or customer discrimination (e.g., Darity and Mason, 1998) then one needs to try to explain why this discrimination is apparently more severe with respect to male employees. In general, if one believes that some other unmeasured characteristic is responsible for these wage differences, then evidence that this characteristic is more important for men than for women would bolster one's case.
This inquiry fits into an extensive literature on the role of segregation in generating racial, ethnic, and sex differences in labor markets, but takes this literature in a new direction. In the literature on sex differences in wages, considerable attention has focused on the role of occupational segregation, in particular the concentration of women in low-wage occupations (e.g., Johnson and Solon, 1986; Sorensen, 1989; Macpherson and Hirsch, 1995). However, relatively little attention has been paid to the role of occupational segregation in generating racial and ethnic differences in wages (for an exception, see Sorensen, 1989), in part because occupational segregation between races and ethnic groups is much less pronounced than occupational segregation between the sexes (King, 1992; Watts, 1995).
Furthermore, even less attention has been paid to the role of segregation along other dimensions such as industry, employer, and job cell (occupation within employer). The main reason for the lack of such work is that the data sets labor economists typically use to study wage differences are household data sets, which allow one to measure the percent female or black in an occupation or industry, but not the sex, race, or ethnic composition of firms, establishments or jobs. Economists interested in studying these other dimensions of segregation have had to turn to other special data sources in which information on the workforce is available or can be constructed. For example, Groshen (1991) uses data from the Bureau of Labor Statistics Industry Wage Surveys, with which one can measure the percent female by establishment as well as job cell. Blau (1977) studies BLS Area Wage Surveys, which cover clerical, professional, and technical occupations, and which allow the estimation of percent female along the same dimensions. Bayard, et al. (1998) construct a data set (called the New Worker-Establishment Characteristics Database, or NWECD) based on a match of employees to their establishments, and carry out an analysis of the roles of sex segregation by occupation, industry, establishment, and job cell, similar to Groshen's. While there are differences in the findings reported in these studies, all find that in addition to being concentrated in low-wage occupations, women are also concentrated in low-wage establishments and low-wage job cells.(4)
In this paper, we use the NWECD to study the role of racial and ethnic segregation in generating wage differences between whites, blacks, and Hispanics. The NWECD is uniquely suited to this analysis, as the Industry and Area Wage Surveys contain no information on race and ethnicity. Thus, to the best of our knowledge, this is the first paper that looks at segregation by race and ethnicity at the level of the establishment and job cell.(5) We consider evidence on the effects of racial and ethnic segregation on wages, and the extent to which racial and ethnic wage differences remain after controlling for segregation. Such evidence helps to assess whether equal pay policies are likely to reduce these wage differences (assuming that these remaining differences reflect discrimination).(6) We are particularly interested in the question posed in the title of this paper, namely whether more severe racial and ethnic segregation among men can explain why racial and ethnic wage gaps are bigger among men than among women.
II. The Data
The NWECD is created from two data sources, the Sample Detail file (SDF), which contains all individual responses to the 1990 Decennial Census one-in-six Long Form, and the 1990 Standard Statistical Establishment List (SSEL), which is an administrative database containing information for all business establishments operating in the United States in 1990. We construct the NWECD by using detailed location and industry information available in both data sets to match worker records in the SDF to employer records in the SSEL. In this section we discuss the details of the matching process, assess the accuracy of the match, and discuss the representativeness of these matched data.
The Matching Process
Households receiving the 1990 Decennial Census Long Form were asked to report the name and address of the employer in the previous week for each employed member of the household. In addition, respondents were asked for the name and a brief (one or two word) description of the type of business or industry of the most recent employer for all members of the household. Based on the responses to these questions the Census Bureau assigned geographic and industry codes to each record in the data and it is these codes that are available in the SDF. In addition to this information, the SDF contains the standard set of demographic characteristics collected on the long-form of the Decennial Census. To construct the NWECD we first selected records for the slightly more than 17 million respondents who indicated they were employed in the previous week.
The SSEL is an annually updated list of all business establishments with one or more employees operating in the United States that the Census Bureau uses as a sampling frame for its various Economic Censuses and Surveys. As such, the SSEL contains the name and address of each establishment, geographic codes based on its location, and a four-digit SIC code. In addition, the SSEL contains data on the number of employees and total annual payroll for the establishment, a unique establishment identifier, as well as an identifier that allows the establishment to be linked to other establishments that are part of the same enterprise. To construct the NWECD, we selected the 5.6 million records from the 1990 SSEL. We focus on the private sector, excluding establishments in Public Administration.
Matching workers to employers proceeded in four steps. First, we standardized the geographic and industry codes in the two data sets. Next, we selected all establishments that were unique in an industry-location cell. Third, all workers who indicated they worked in the same industry-location cell as a unique establishment were matched to the establishment. Finally, we eliminated all matches based on imputed data. The resulting data set is what we call the NWECD.
There are a number of issues involved in the matching process that merit further discussion. The first set of issues concerns standardizing the geographic and industry codes. The Census Bureau divides the country into a hierarchy of geographic areas. For our purposes the relevant areas are state, county, place, tract, and block. The Census Bureau assigns a unique code to every state in the country. Within each state the Census Bureau assigns a unique code to every county. The Census Bureau also assigns a unique place code to population centers with 2,500 or more people. Because these population centers are unique within a state, but can cross county boundaries, we can distinguish between areas in the same place located in different counties. Finally, the Census Bureau divides up populated counties into unique tracts and divides tracts up into unique blocks.(7) Thus, for an establishment located in a metropolitan area, the Census Bureau assigns a unique geographic code which identifies the state, county, place, tract, and block of the establishment.
One problem with using these geographic codes is that while the Census Bureau consistently assigned all of these codes to the data in the 1990 SDF, prior to 1992 the Census Bureau only assigned state, county, and place codes to records in the SSEL. In addition, due to problems with addresses in the SSEL, even after 1992 the Census Bureau only assigns tract and block codes to a subset of records in the SSEL.(8) To assign tract and block codes to records in the 1990 SSEL, we matched these records with records in the 1992 SSEL and, when available, assigned the tract and block codes from the 1992 SSEL to establishment records in the 1990 SSEL. We assigned missing values for tract and block codes to establishments that were not in the 1992 SSEL and to establishments that had missing tract and block codes in the 1992 data.
Industry codes must also be standardized because the industry code in the SSEL is based on the Standard Industrial Classification (SIC) system while the Census Bureau assigns three-digit Census Industry Classification (CIC) codes to the SDF data. Since the CIC codes are more aggregated than the SIC codes we use a concordance table to convert SIC codes to CIC codes.(9)
The next step in matching workers to employers is to keep only those establishments that are unique in an industry-location cell. Recall that for all establishments in the SSEL we have state, county, and place codes, while for a subset of establishments we also have tract and block codes. In order to select establishments that have unique industry-location information we first keep establishments that are unique in an industry-state-county-place cell. For the rest of the establishments (which may not be unique at the place level, but may be at the tract or block level), we first keep only those establishments in a place cell where all the establishments in this cell have non-missing tract and block codes. We then keep those establishments that are unique in an industry-state-county-place-tract-block cell. This produces a data set with 385,135 establishments available for matching. We then assign workers to industry-location cells, and match all workers who are in the same industry-location cell to the corresponding establishment.
We then take a number of steps to help ensure that workers are properly matched with employers. We begin by discarding all matches based on imputed data. Data can be imputed for a number of reasons. First, respondents may have not provided address or industry information for their employer (or may have provided unusable information). When this occurs the Census Bureau imputes the geographic or industry information for the worker. In addition, an establishment's record in the SSEL may have an incomplete SIC code, in which case the Census Bureau randomly assigns the additional digits necessary to create a complete SIC code. Whenever we have a match based on imputed information, that match is eliminated.(10)
We also discard matches when the number of workers matched to an establishment exceeds the number of employed workers as reported by the establishment in the SSEL. There are several reasons why the number of matched workers might exceed total employment, some reflecting errors and others not. First, there may be errors in the industry or geographic codes for some workers or establishments in the SDF or SSEL. Second, there is a time lag between when the Census Bureau surveys workers and employers. Census asks workers where they worked on April 1 and asks employers how many workers they employed as of March 12; total employment on April 1 may exceed total employment on March 12. A third problem is that workers may be incorrectly assigned to locations because of imprecise SDF questions. Because the SDF asks workers only where they worked in the past week, workers who were working at a site other than their primary employer's location may be improperly assigned to the establishment at which they were working that week. Fourth, in the SSEL total employment includes only an establishment's employees, not its owners. In the SDF, however, both owners and employees are assigned to a particular establishment. Thus, although there may be legitimate reasons for the number of matched workers to exceed reported establishment employment, because only long-form respondents to the Decennial Census are eligible for matching, cases where the number of matched workers exceeds employment reported in the SSEL likely reflect serious measurement or matching problems. To avoid potentially incorrect matches we discard cases where this occurs.(11) The resulting data set contains 1,056,635 workers matched to 153,291 establishments.
Evaluating the Matched Data
One of the main uses of these data is to construct estimates of characteristics of establishments' workforces (such as the skill of workers within an establishment, or, in this particular paper, the percent black, etc.) using the worker data. Therefore, in evaluating these data, we would like to compare estimates of establishment characteristics based on worker data with estimates of the same characteristics based on establishment data. Unfortunately, the only information that is common in the worker and establishment data sets are worker earnings. As a result, in this section we focus on comparing estimates of worker earnings from the worker and establishment data. Row (1) in Table 1 presents the cross-establishment mean of worker earnings based on data from the SSEL. Using the SSEL data, per-worker earnings in an establishment are estimated by dividing the 1990 annual payroll for the establishment by the establishment's employment in the pay period including March 12, 1990. The numbers in row (1) are an average of this per-worker earnings estimate across all relevant establishments in the NWECD. We will refer to this number as SSEL worker earnings. Row (2) presents the cross-establishment mean of worker earnings based on the SDF data. Each worker in the SDF reports his total earnings in the previous year. Using the SDF data, per-worker earnings in an establishment are estimated by taking the average reported earnings for all workers matched to the establishment. The numbers in row (2) are then the average of this per-worker earnings estimate across all establishments in the data. We will refer to this number as SDF worker earnings. Row (3) presents the cross-establishment mean log difference in these two estimates of worker earnings, while row (4) presents the cross-establishment correlation of these two estimates of worker earnings. Row (5) presents the cross-establishment mean of total employment in the establishments (based on SSEL data), while row (6) presents the average proportion of workers matched to the establishment. Column (1) in Table 1 presents numbers for all establishments and workers in the NWECD, column (2) presents numbers for establishments with fewer than 25 employees, column (3) presents results for establishments with 25 or more employees in the establishment, and column (4) presents results for establishments with 25 or more employees where we have matched at least five percent of the establishment's workforce.
The results in column (1) suggest that, by and large, workers are being matched to the correct establishments. The first two rows show that the establishment and worker data produce very similar estimates of average worker earnings. Row (3) shows that, on average, there is just a 2.7 percent difference in the two earnings estimates at the establishment-level, and row (4) shows these estimates are positively and significantly correlated across establishments.
The figures in columns (2) and (3) show that the quality of the matched data differs by the size of the establishment. Among establishments with fewer than 25 employees, the correlation of SSEL worker earnings and SDF worker earnings is 0.196 while among establishments with more than 25 employees this correlation is 0.436. Finally, among establishments with more than 25 employees and with at least five percent of the workforce matched to the establishment the correlation between the two wage measures is 0.536.
Table 2 breaks out the numbers in Table 1 by whether or not the establishment is located in an MSA (Panel A), by establishment size (Panel B), and by one-digit industry (panel C). Panel A shows that the quality of the matched data does not appear to differ by the location of an establishment. However, the numbers in panel B provide further evidence that the quality of the matched data varies by establishment size. While the numbers in column (3) show no systematic relationship between the cross-establishment differences in the two earnings estimates and size, the results in column (4) show that there is a strong positive relationship between the cross-establishment correlation of the two earnings estimates and establishment size.(12)
The results in panel C show that, except for the construction industry, there does not appear to be any systematic difference in the quality of the matched data across industries. In the construction industry, since we are only able to match workers to 129 establishments, the cross-establishment correlation between the SSEL and SDF earnings measures is insignificantly different from zero. (Also, the difference between the two earnings estimates is large.) In all other industries we have over 1,000 establishments matched with workers (and in a number of industries over 10,000 establishments), and there always exists a positive and significant cross-establishment correlation between the SSEL and SDF earnings measures.
The results in Tables 1 and 2 suggest that, with the possible exception of some of the smaller establishments, workers are being matched to the correct establishments. Estimates of average worker earnings based on the SSEL and SDF data are very similar, and are positively and significantly correlated across establishments. In addition, there appears to be no systematic difference in the quality of the matched data across different industries (with the exception of the construction industry) nor by whether or not the establishment is located in an MSA. We now turn to examining the representativeness of these data.
Examining the Representativeness of the NWECD
To begin examining whether the NWECD data are representative of the underlying population of workers and establishments, Table 3 presents the number of and average employment for all SSEL establishments, unique establishments, and NWECD establishments for all establishments in the data (Panel A), by whether or not the establishment is located in an MSA (Panel B), by size (Panel C), and by industry (Panel D). "Unique establishments" are establishments that are unique in an industry-location cell. As mentioned earlier, only establishments that are unique in an industry-location cell are matched to workers. Establishments with workers matched to them are "NWECD establishments." Columns (1)-(3) present the number of SSEL establishments, unique establishments, and NWECD establishments, respectively. Column (4) presents the proportion of SSEL establishments that are unique, while column (5) presents the proportion of SSEL establishments in the NWECD. Columns (6)-(8) present mean employment for all SSEL establishments, unique establishments, and NWECD establishments, respectively.
Panel A shows the effect of the matching strategy on the overall sample of establishments. Of the 5.6 million establishments in the SSEL data, just 6.9 percent or 385,135 establishments can be assigned to a unique industry-location cell. In addition, these unique establishments are almost 80 percent larger than the typical establishment, averaging over 36 employees compared with 20 employees in the typical SSEL establishment. The numbers in this row also show that simply being unique does not guarantee that an establishment appears in the NWECD. The NWECD contains only 153,291 establishments, representing 2.7 percent of SSEL establishments. Matched establishments, averaging 72 employees, tend to be even larger than unique establishments. This increase in average size is the result of two factors: first, the fact that the long-form portion of the 1990 Decennial Census is a sample means that large establishments are more likely to contain employees receiving a long form; and second, smaller establishments are more likely to be eliminated from the data because the match was based on imputed data or because the number of matched workers exceeded reported employment.
The numbers in Panels B and C in Table 3 show that the probability of being unique and the probability of appearing in the NWECD vary systematically with the location and size of the establishment. Panel B shows that establishments located outside of an MSA are more than twice as likely to be located in a unique industry-location cell and to appear in the final data set. The numbers in columns (6)-(8) show that, for establishments both within and outside of an MSA, the matching strategy produces a data set with establishments that are substantially larger on average than establishments in the SSEL. The results in panel C show that the probability that an establishment is unique in an industry-location cell, and the probability that an establishment appears in the data, increases monotonically with the size of the establishment. Only 6.4 percent of establishments with fewer than 10 employees are located in a unique industry-location cell and only 1.8 percent of these establishments appear in the NWECD. In contrast, 20 percent of establishments with 500 or more employees are in a unique industry-location cell and 19 percent of these establishments appear in the NWECD.
Panel D in Table 3 shows that the match rate varies substantially by industry. Manufacturing establishments are the most likely to be located in a unique location-industry cell and are the most likely to appear in the NWECD. Construction establishments are the least likely to be either unique or in the final data set.(13) The effect of the matching process on the size of establishments in our final data set also varies by industry. Manufacturing, transportation, mining, and services establishments in the NWECD are all substantially larger on average than the typical establishment in these industries in the SSEL. However, Wholesale, Retail, and Finance, Insurance, and Real Estate (FIRE) establishments in the NWECD are approximately the same size on average as the average establishment in these industries in the SSEL. This latter phenomenon occurs because the establishments that are unique in an industry-location cell in these three industries are much smaller than the average establishment in the SSEL. We presume that this occurs because large establishments in these industries tend to be located in geographic proximity to other such establishments (such as at shopping malls), and therefore only smaller establishments tend to be unique in industry-location cells. Note that for these three industries, the establishments in the NWECD sample are nearly twice as large as the establishments in the "unique" sample. This is natural (and indeed occurs for almost all industries), as larger establishments in the "unique" sample have a higher probability of having matched workers.
Table 4 compares the number and annual earnings of workers in the SDF with workers in the NWECD for all workers (Panel A), by whether or not a worker's employer is located in an MSA (Panel B) and by one-digit industry (Panel C).(14) Columns (1) and (2) present the number of workers in the SDF and NWECD, respectively, while column (3) presents the proportion of workers matched to an establishment (column (2)/column (1)). Columns (4) and (5) present the mean of worker earnings in the SDF and NWECD, respectively, while column (6) presents the log difference in the average worker earnings estimates.
The numbers in Panel A show that, of the over 14 million workers in the original data with similar characteristics as the NWECD workers, we are able to match 1,056,363 workers to their employers, a match rate of 7.4 percent, indicating that we are able to match a larger percentage of workers than establishments. This is not surprising given the results in Table 3 which showed that the match rate is higher for large establishments than for small establishments. Columns (4)-(6) show that the average earnings of matched workers are quite similar to the average earnings of all workers in the SDF.
The numbers in panels B and C show that the worker match rate varies across location and industry in a fashion similar to the establishment match rate. Panel B shows that we are much more likely to match workers who worked in establishments located outside of an MSA. Panel C shows that we are much more successful matching manufacturing workers and not very successful matching construction workers.
The results in Tables 3 and 4 do raise some concerns about the representativeness of the NWECD. In particular, we have seen that establishments in the NWECD are substantially larger than establishments in the SSEL. In addition, the results in these tables show that we are much more successful matching establishments and workers that are located outside of an MSA and matching establishments and workers that are in the manufacturing industry. However, the overall effects of this non-random matching clearly depend on the questions being addressed with these data. In particular, the non-representative nature of the NWECD may render it of little value in constructing population estimates, but may have little impact on estimated conditional means (or regression relationships). To try and further judge the usefulness of these data, we examine whether the NWECD data can replicate well-established relationships between establishment and worker characteristics and wages. We begin with Table 5, which compares the characteristics of workers in the SDF with the characteristics of workers in the NWECD. We make this comparison for all workers in both files (columns (1) and (2)), for workers who earn between $2.50 and $500 an hour (columns (3) and (4)), and for workers earning these wages who usually work over 30 hours a week and work at least 30 weeks in the previous year, whom we call full-time workers (columns (5) and (6)).
Turning first to the means, the numbers in Table 5 point to some differences between the workers represented in the NWECD and those in the entire SDF. Columns (1) and (2) show that NWECD workers are slightly less likely to be black and more likely to be married than workers in the SDF. In addition, NWECD workers are more likely to be laborers and to work in manufacturing and services. NWECD workers are also slightly older, and are more likely to have a high school degree but less likely to have no high school education or to have a bachelor's or advanced degree. NWECD workers also tend to work more weeks in the previous year, but have slightly lower earnings and hourly wages. Finally, the numbers in columns (3)-(6) show that these basic findings change very little when we impose two standard types of exclusion restrictions on the data.
Table 6 presents the results from regressions of (log) worker wages on a standard set of worker characteristics. Column (1) presents results based on a 10-percent random sample of workers in the SDF, while column (2) present results from the same regression, adding a control for whether the worker is matched to an establishment. Column (3) presents the results for the identical regression in column (2) just using the data from the NWECD. The coefficient on the match variable in column (2) shows that, controlling for the standard set of worker characteristics, matched workers earn 2.6 percent lower wages. However, the coefficients on the characteristics across the three regressions are quite similar. In all three regressions, female workers earn 10-11 percent lower wages, black male workers earn six percent lower wages, Hispanic male workers earn wages that are lower by 7.2 to 8.3 percent, and married male workers earn 19-20 percent higher wages.(15) The relationship between education and wages is also similar across all three columns. The estimated coefficients of the interactions of female with black and Hispanic differ a bit; in each case the female-male differences in the race differentials are smaller in the NWECD sample.
Table 7 presents the results from regressions of (log) average annual earnings in an establishment on various establishment characteristics. Column (1) reports results based on all establishments in the SSEL, while column (2) adds a control for whether a worker is matched to the establishment. Columns (3) and (4) present similar regressions based on all unique establishments. Finally column (5) presents results based on all establishments in the NWECD. The coefficient on the match variable in column (2) indicates that matched establishments pay 7.2 percent lower wages than the typical SSEL establishments, while the coefficient on the match variable in column (4) shows that matched establishments pay 4.5 percent higher wages than the typical unique plant. However, comparing the coefficients on the other variables across the five columns indicates that the relationship between these characteristics and log average wages in the plant are quite similar. In all three samples larger establishments, establishments located in a place, establishments that are a part of multi-unit firms, older establishments, and establishments located in the northeast, pay higher wages.
The analysis of representativeness of the NWECD suggests that the data set is not a representative sample of the underlying population of establishments or workers. However, it appears that the non-representativeness is unlikely to introduce much bias into estimates of the types of relationships we estimate in subsequent sections of this paper; regression estimates of equations for worker and establishment earnings (except for the intercepts) are very similar for the matched and full samples of workers and establishments.
Summary
In this section, we presented a detailed description of the procedures used to construct the matched employer-employee data set that we call the NWECD. We presented evidence that we successfully match workers to plants, and that regression estimates using this matched sample are unlikely to be biased from sample selection associated with the matching, even though the data set is non-representative. These results, coupled with the fact that the NWECD data is the largest employer-employee matched data set currently in existence in the U.S., suggests that it will be a valuable tool for analyzing a variety of labor market issues. In the remainder of the paper, we turn to evidence on the role of segregation in generating racial and ethnic wage gaps that differ by sex, an empirical application for which these data are uniquely well-suited.
III. Methods
Our decompositions of racial and ethnic wage gaps are based on estimates of log wage regressions of the following form:
(1) ln(w) = + BB + HH + BOCC%B + BIND%B + BEST%B + BJOB%B
+ HOCC%H+ HIND%H + HEST%H+ HJOB%H + X + ,
where w is the hourly wage, B is a dummy variable equal to one if the individual is black, and H is a dummy variable equal to one if the individual is Hispanic. The variables OCC%B and OCC%H are the percentages black and Hispanic in the individual's occupation (expressed as proportions), and similarly the variables IND%B, IND%H, EST%B, EST%H, JOB%B, and JOB%H are the percentages black and Hispanic in the individual's industry, establishment, and job cell. A vector of control variables is represented by X. These regressions are estimated separately by sex, as are the various percentages black and Hispanic. Note that we allow the effects of segregation to differ by race and Hispanic ethnicity.
With the estimated coefficients of equation (1) in hand, we decompose the difference in average log wages between blacks and whites (denoted wB' and wW') as follows:
(2) wB' - wW' = B' + B'(OCC%BB - OCC%BW) + B'(IND%BB - IND%BW) + B'( EST%BB
- EST%BW) + B'(JOB%BB - JOB%BW)
+ H' (OCC%HB - OCC%HW) + H'(IND%HB - IND%HW) + H'( EST%HB
- EST%HW) + H'(JOB%HB - JOB%HW) + (XB - XW)' ,
where primes on the coefficients indicate estimates, and B and W subscripts on the variables indicate means for blacks and whites, respectively. In this decomposition, B' measures the black-white difference that remains after controlling for the variables in X, and for segregation by occupation, industry, establishment, and job cell. Since the inclusion of these segregation measures should account for the relationship between race and any excluded variables related to the job cell, B' is often referred to as the "within-job-cell" race difference in wages. The term B'(OCC%BB - OCC%BW) measures the extent to which the wages of black and white workers differ because of occupational segregation by race (with segregation of blacks into lower-wage occupations, as it turns out). Similarly, the terms involving B', B', and B' capture wage differences due to industry, establishment, and job-cell segregation. The second set of terms - beginning with H' (OCC%HB - OCC%HW) and ending with H'(JOB%HB - JOB%HW) - capture black-white wage differences attributable to the differential segregation of blacks and whites into occupations, industries, establishments, and job cells with different percentages Hispanic. To the extent that blacks and whites are in occupations, industries, etc., with similar percentages Hispanic, as turns out to be the case, these effects will be rather small.(16)
We also construct a similar decomposition for Hispanic-white wage differences. With the estimated coefficients of equation (1) in hand, we decompose the difference in average log wages between Hispanics and whites (denoted wH' and wW') as follows:
(3) wH' - wW' = H' + B'(OCC%BH - OCC%BW) + B'(IND%BH - IND%BW) + B'( EST%BH
- EST%BW) + B'(JOB%BH - JOB%BW)
+ H' (OCC%HH - OCC%HW) + H'(IND%HH - IND%HW) + H'( EST%HH
- EST%HW) + H'(JOB%HH - JOB%HW) + (XH - XW)' .
In this case the second set of terms - beginning with H' (OCC%HH - OCC%HW) - captures the effects on the Hispanic-white wage differential of segregation of Hispanics into occupations, industries, establishments, and job cells with other Hispanics, and the first set of terms (involving B', B', B', and B') captures the effects of segregation of Hispanics and whites into occupations, industries, etc., with different percentages black. H' measures the within-job-cell wage differential between Hispanics and whites.
The percent black and Hispanic variables in equation (1) are all estimated directly from the data. The percentages black and Hispanic in the occupation and industry are estimated from the full SDF sample, so measurement error is likely to be minimal. However, the percentages black and Hispanic in the plant and job cell are estimated by necessity from the matched data in the NWECD. On average 19.18 workers are matched to a plant, so job-cell estimates, in particular, are often based on a small number of observations. Measurement error in these estimates therefore could be sizable, biasing the estimates of and towards zero (and presumably biasing the other coefficient estimates as well, although a priori the direction of bias is unclear). One motivation for restricting attention to larger establishments (those with 25 or more workers) is to avoid very small cells.
While establishments are well-defined, industries and occupations can be defined at a variety of levels of disaggregation. Since a question of primary concern is within- vs. across-job wage differences, we are interested in trying to use relatively narrow occupational classifications. Because we also look at establishment-occupation cells (i.e., job cells), however, and because we are looking at rather narrow racial and ethnic groups, if we use highly disaggregated occupations we can end up with very few observations in some job cells, particularly since we only have a sample of workers in each plant and consequently in each job cell. Therefore, we report evidence from specifications using two alternative levels of occupational disaggregation, beginning first with 13 Census occupations, and then using a considerably greater level of disaggregation involving 72 Census occupations.(17) Because all workers in an establishment work in the same industry, and because the percent black and percent Hispanic in an industry are estimated using the full SDF, we face no constraint in disaggregating industries finely, and hence we always use the most-detailed four-digit SIC codes. To preview the results, we find that the qualitative conclusions are not affected by the level of occupational detail.
We also report results in which we estimate B and H controlling for fixed occupation, industry, establishment, and job cell effects, rather than controlling for the percent black and Hispanic in each of these categories; this amounts, of course, to putting in job cell dummy variables, since these absorb occupation, industry, and establishment effects.(18) In the absence of measurement error, assuming that we have specified the wage regression correctly, we would not expect estimates of B and H obtained using these fixed effects to differ much from estimates using the percent black and Hispanic variables, since the correlation of B and H with occupation, industry, establishment, and job-cell characteristics should be captured by the percent-black and percent-Hispanic variables.(19) However, using job cell dummy variables avoids the measurement error inherent in the percent-black and percent-Hispanic variables, and therefore should provide more reliable estimates of the within-job cell racial and ethnic differences in wages (B and H), even when cell sizes are small; because the sample is one of individuals, job cells with more observations implicitly receive more weight. This specification is also useful because it accounts for the correlation between observations in the same establishment (and job cell). In contrast, when we run OLS for the specifications using the percent-black and percent-Hispanic variables, the standard errors could be downward biased because of within-establishment or within-job-cell correlations in the error.
IV. Results
Descriptive Statistics
Table 8 reports descriptive statistics for black, Hispanic, and white male and female workers in the NWECD. The average log hourly earnings data reflect the stylized fact with which this paper began; racial and ethnic differences are considerably larger for men (-0.23 for blacks and -0.24 for Hispanics) than for women (-0.16 for blacks and -0.13 for Hispanics). Not surprisingly, whites are more likely to be married, to have fewer children, and to have higher educational degrees, all of which are associated with higher wages. Furthermore, the education differences are a bit sharper among men than among women (for education, look at the proportions with no high school degree, a Bachelor's degree, or an Advanced degree), which may partly explain the larger raw racial and ethnic wage differences among men compared with women. In general, Hispanics work in smaller establishments than do whites, while blacks work in larger establishments.
Table 9 reports some baseline OLS regressions describing the multivariate relationships between the variables listed in Table 8. Columns (1) and (5) report the raw racial and ethnic differences, from regressions with no controls. In columns (2) and (6) we add individual-level controls for age, children, marital status, and education, as well as region of the country and residence in an MSA. For women, the black-white wage differential falls to -0.02, while the Hispanic-white differential falls by nearly two-thirds. For men, both differentials fall by about half. In columns (3) and (7) we add controls for English language fluency and citizenship. Trejo (1997) finds that English language deficiencies are an important source of lower earnings for Mexican Americans. We include these controls because they are likely to reflect human capital differences (or more generally to be related to productivity), although we recognize the possibility that there is discrimination based on differences in language or citizenship. The estimated Hispanic-white wage differential falls by about one-half for both women (to -0.023) and men (to -0.068). Still, the gap remains considerably larger for men. In columns (4) and (8) we find that adding controls for establishment size and industry has relatively small effects on the estimated racial and ethnic wage differences, with the exception of the Hispanic-white differential for males, which falls to -0.051. These establishment-level controls may to some extent be related to unobserved human capital, calling for their inclusion along with the other human capital controls. On the other hand, to the extent that these solely reflect establishment-level characteristics, they may "over-control" for establishment-level differences, because they may capture dimensions of racial and ethnic segregation. As a consequence, we omit them in the decompositions that follow. Regardless, we see that upon inclusion of either set of control variables, racial and ethnic wage differences remain considerably larger for men than for women.
Next, we turn to estimates incorporating information on racial and ethnic segregation by occupation, industry, establishment, and job cell, both to better understand the sources of racial and ethnic differences in wages, and in particular to see whether greater segregation, or greater effects of segregation, explain the sharper black-white and Hispanic-white wage differentials among men.
Estimates of the Effects of Racial and Ethnic Segregation and Decompositions of Wage Differentials
Table 10 reports results of wage regression estimations using the relatively more-aggregated 13 occupation categories. The first five columns report results for women, and the second five report results for men. In column (1), we report estimates from a specification that adds the percent-black and percent-Hispanic variables to the individual-level controls included in column (3) of Table 9. Similarly, in column (6) we report estimates from a specification that adds the percent-black and percent-Hispanic variables to the individual-level controls included in column (7) of Table 9. We report the estimated coefficients of the black and Hispanic dummy variables, as well as each of the percent-black and percent-Hispanic variables. Turning first to the within-job-cell racial and ethnic wage gaps, we see that the black-white wage gap for women - which was small (-0.023) to begin with in Table 9 - becomes slightly smaller (-0.012) once we control for the effects of segregation. The Hispanic-white wage gap for women shrinks by a similar amount, from -0.023 to -0.016. In contrast, the black-white wage gap for men shrinks from -0.122 to -0.073, while the Hispanic-white wage gap for men shrinks from -0.068 to -0.029. Therefore, the sex difference in the Hispanic-white wage gap is largely eliminated once we account for segregation; the difference between the Hispanic-white differential for men and women falls from 0.045 to 0.012. In contrast, although segregation explains part of the larger black-white wage differential for men, the black-white differential still remains substantially larger for men; the sex gap in this differential is 0.099 (0.122 - 0.023) in Table 9, and 0.061 (0.073 - 0.012) in Table 10.
Looking at the effects of racial and ethnic segregation for women, we see why the within-job-cell racial and ethnic wage gaps for women shrink a bit once we account for segregation. The estimates in column (1) indicate that working in an occupation, industry (for Hispanics), or job cell with a higher percent black or Hispanic is associated with significantly lower wages. In contrast, though, working in an establishment with a higher percent black or Hispanic is associated with significantly higher wages (or equivalently, wages are higher in establishments with higher percentages black and Hispanic).
The estimated negative effects for women of a high percentage black or Hispanic appear particularly large for occupation. There is considerably less segregation along these lines than along the lines of establishment or job cell, however. This is apparent from columns (2) and (4), which report the mean differences in the percentage black or Hispanic between black and white workers (in column (2)), and between Hispanic and white workers (in column (4)). For example, the entries in rows three through six of column (2) report the mean differences in the percentage black between black and white workers. These differences are small for occupation (0.018) and industry (0.015), reflecting the fact that there is relatively little occupation or industry segregation by race.(20) However, segregation by establishment (0.292) and job cell (0.443) is much more severe. Thus, the coefficient estimate on the percent black in the occupation (-1.012), for example, seems rather large, but does not contribute that much to the lower wages of black women. In contrast, the smaller coefficient estimate on percent black in the job cell (-0.114), for example, is applied to a much larger difference. The findings for Hispanic women, in rows eight through 11 of column (4), suggest similar patterns of segregation to those by race, with rather severe segregation by establishment and job cell, but not industry and occupation. In contrast, the numbers at the bottom of column (2) and the top of column (4) indicate that blacks are not particularly concentrated in occupations, industries, establishments, or job cells with high percentages Hispanic, or vice versa.
The mean differences in columns (2) and (4) are used along with the estimates in column (1) to decompose the wage differentials, as reported in columns (3) and (5). In these columns, we also report the combined effects of segregation by race and by ethnicity. The numbers reveal that segregation of black women by race lowers the wages of black women by 0.4 percent, and segregation of Hispanic women by ethnicity lowers the wages of Hispanic women by 1.3 percent. For both groups, the negative segregation effect stems primarily from job-cell segregation, i.e., from the segregation of black or Hispanic women into particular jobs within establishments.
Columns (6)-(10) report results of similar estimations and computations for men. Looking at the effects of segregation shows, correspondingly, that segregation by ethnicity reduces wages of Hispanic males by more than segregation by race reduces wages of black males. Moreover, segregation by race and ethnicity lowers wages of black and Hispanic men by considerably more than black and Hispanic women. Overall, the stronger negative effect of segregation for Hispanic men is summarized in the entry labeled "Segregation by Hispanic ethnicity," which indicates that such segregation lowers wages of Hispanic males by 6.1 percent. In contrast, ethnic segregation lowers wages of black men by 4.7 percent, while the corresponding numbers for black and Hispanic women are only 0.4 percent and 1.3 percent, respectively.
Note that for both men and women ethnic segregation appears somewhat more severe than racial segregation, while among the four groups, segregation is most severe among Hispanic men. For example, for men the mean difference between Hispanics and whites in the proportion Hispanic in the job cell is 0.526, vs. a mean difference between black and whites in the proportion black of 0.429. The corresponding numbers for women are 0.459 and 0.443. Turning instead to the estimated coefficients of the segregation variables, the negative effect of industry segregation is stronger for Hispanic males than for the other three groups, while for black men the negative effect of occupational segregation is particularly strong. Also, Hispanic males work in higher-paying establishments to a lesser extent than black and Hispanic females and black males. For Hispanic males, ethnic segregation by establishment raises wages by only 2.0 percent, while establishment segregation along race or ethnicity lines raises wages by 6.3, 5.3, and 3.5 percent for these other three groups. Thus, the larger role of ethnic segregation in lower wages of Hispanic men stems from both more severe segregation and from stronger deleterious (or weaker beneficial) effects of segregation, while the larger role of segregation in lower wages for black men stems mainly from the stronger effects of segregation.
To explore the sensitivity of these results to the level of occupational aggregation, Table 11 reports results from a parallel analysis using 72 Census occupations instead of 13. Looking first at the regression estimates of the within-job-cell racial and ethnic wage differences, we see, again, that accounting for segregation makes the black-white wage gap for women very small (-0.009), and similarly for the Hispanic-white wage gap for women (-0.010, and insignificant). We also see, again, that segregation accounts for a sizable portion of the greater black-white wage gap among men than among women, with the estimated black-white wage gap for males falling from -0.122 in Table 9 to -0.064 in Table 11. On the other hand, even more so than in Table 10, accounting for ethnic segregation lowers the Hispanic-white wage gap among men (-0.014), almost to the same magnitude as for women (-0.010). As we would expect, this is reflected in an even more-pronounced difference in the extent to which ethnic segregation lowers wages of Hispanic men. As indicated in the second-to-last row of the table, the combined effect of ethnic segregation on Hispanic women is to lower wages by 1.5 percent, compared with a much larger 7.0 percent figure for men. By way of contrast, the negative effect of racial segregation is only a bit larger for black men than for black women (3.3 vs. 1.9 percent).
Thus, we have a relatively robust finding from two quite different levels of occupational segregation indicating that the consequences of segregation are particularly severe for Hispanic men, and account for much or most of the larger Hispanic-white wage gap for men than for women. In contrast, segregation explains about one-third to nearly one-half of the larger black-white wage gap for men. It appears, therefore, that although segregation is important, some other source aside from segregation accounts for a sizable fraction of the black-white wage gap for men. The evidence in this paper does not speak directly to whether this might be wage discrimination or unobservable productivity differences.
Estimates Using Fixed Effects
Finally, we turn to the fixed-effects analysis mentioned earlier. In part to examine the effects of measurement error in the estimated percentages black and Hispanic by establishment and job cell, we compare results for within-job-cell racial and ethnic wage gaps from the regressions reported in Tables 10 and 11 with those that we obtain using fixed occupation, industry, establishment, and job-cell effects. We would think that, in the absence of this measurement error, the two procedures would yield similar results. Although the job-cell dummy variables capture the effects of unobservable variables as well as the effects of the percentages black and Hispanic, the correlation between these unobservables and the black and Hispanic dummy variables should be accounted for by including the segregation variables, so that once these segregation variables are included, unobservable characteristics of the job cell should be uncorrelated with race and ethnicity.
In Panels A and B of Table 12, the first three rows summarize the earlier results. We first report the estimated coefficients of the black and Hispanic dummy variables from the basic wage regressions without segregation controls (corresponding to columns (3) and (7) of Table 9). We then report these estimates once the segregation controls are included, followed by the combined segregation effects (corresponding to the rows labeled "Segregation by race" and "Segregation by Hispanic ethnicity" in Tables 10 and 11). Finally, we report the estimated coefficients of the black and Hispanic dummy variables when we instead include the fixed effects.
In both Panels A and B we see that the sex difference in the wage gap between blacks and Hispanics is a bit smaller when we use job-cell dummy variables, as opposed to segregation controls.(21) However, the results are qualitatively similar; in Panel A we see that the sex difference in the wage gap for blacks is 0.052 when using job-cell dummy variables and 0.061 when using segregation controls, whereas for Hispanics the sex differences are 0.003 and 0.013. Thus, we again find that controlling for segregation accounts for most or all of the sex difference in the Hispanic-white wage gap, and nearly one-half of the sex difference in the black-white wage gap.
The estimates in Panel B are very similar to those in Panel A, again confirming that the level of occupational aggregation has little influence on the results. Most importantly, more of the sex difference in the black-white wage gap remains (0.043) compared with the sex difference in the Hispanic-white wage gap (which actually changes sign, to -0.003, but is indistinguishable from zero).
V. Conclusions
The goal of this paper is to assemble general evidence on the effects on wages of racial and ethnic segregation along the lines of occupations, industries, establishments, and job cells (i.e., the same jobs within establishments). More specifically, we ask whether larger racial and ethnic wage differences for men than for women are attributable to more severe segregation among men or to more severe effects of this segregation. To generate this evidence, we use a data set we have constructed called the New Worker-Establishment Characteristics Database, or NWECD, which is based on a match of employees to their establishments of employment.
In standard log wage regressions with individual-level controls, black-white and Hispanic-white differentials among women are around two percent, while among men the black-white differential is 12 percent and the Hispanic-white differential seven percent. Our evidence indicates that greater segregation between Hispanic and white men than between Hispanic and white women explains essentially all of the higher Hispanic-white wage gap for men. Similarly, our estimates indicate that greater segregation between black and white men than between black and white women explains a large share (one-third to one-half) of the higher black-white wage gap for men, although the black-white wage gap for men remains sizable (about six to seven percent) after controlling for segregation.
Overall, our results imply that segregation is an important contributor to the lower wages paid to black and Hispanic men than to white men with similar individual characteristics. It further suggests that equal pay types of laws may offer some scope for reducing the black-white wage differential for men, but little scope for reducing the Hispanic-white wage differential for men. Rather, policies intended to reduce the latter should target segregation into lower-paying jobs.
References
Altonji, Joseph G., and Rebecca M. Blank. 1998. "Race and Gender in the Labor Market." Forthcoming in Orley Ashenfelter and David Card, editors, Handbook of Labor Economics, Vol. 3 (Amsterdam: North-Holland).
Bayard, Kimberly, Judith Hellerstein, David Neumark, and Kenneth Troske. 1998. "New Evidence on Sex Segregation and Sex Differences in Wages from Matched Employee-Employer Data." Mimeograph.
Blau, Francine D. 1977. Equal Pay in the Office (Lexington, MA: D.C. Heath and Company).
Cain, Glen C. 1986. "The Economic Analysis of Labor Market Discrimination: A Survey." In Orley Ashenfelter and Richard Layard, editors, Handbook of Labor Economics, Vol. 1 (Amsterdam: North-Holland).
Calabria, Mark A. 1998. "The Census of Construction Industries Database." Center for Economic Studies Working Paper No. 98-11 (August).
Carrington, William J., and Kenneth R. Troske. 1998a. "Sex Segregation in U.S. Manufacturing." Industrial and Labor Relations Review, Vol. 51, No. 3, April, pp. 445-64.
Carrington, William J., and Kenneth R. Troske. 1998b. "Interfirm Segregation and the Black-White Wage Gap." Journal of Labor Economics, Vol. 16, No. 2, April, pp. 231-60.
Darity, William A., Jr., and Patrick L. Mason. 1998. "Evidence on Discrimination in Employment: Codes of Color, Codes of Gender." Journal of Economic Perspectives, Vol. 12, No. 2, Spring, pp. 63-92.
Groshen, Erica L. 1991. "The Structure of the Female-Male Wage Differential: Is it Who You Are, What You Do, or Where You Work?" Journal of Human Resources, Vol. 26, No. 3, Summer, pp. 457-72.
Johnson, George, and Gary Solon. 1986. "Estimates of the Direct Effects of Comparable Worth Policy." American Economic Review, Vol. 76, No. 5, December, pp. 1117-25.
King, Mary C. 1992. "Occupational Segregation by Race and Sex, 1940-88." Monthly Labor Review, April, pp. 30-7.
Macpherson, David A., and Barry T. Hirsch. 1995. "Wages and Gender Composition: Why Do Women's Jobs Pay Less?" Journal of Labor Economics, Vol. 13, No. 3, July, pp. 426-71.
Neal, Derek A., and William R. Johnson. 1996. "The Role of Premarket Factors in Black-White Wage Differences." Journal of Political Economy, Vol. 104, No. 5, October, pp. 869-95.
Oaxaca, Ronald. 1973. "Male-Female Wage Differentials in Urban Labor Markets." International Economic Review, Vol. 14, No. 3, October, pp. 693-709.
Sorensen, Elaine. 1990. "The Crowding Hypothesis and Comparable Worth." Journal of Human Resources, Vol. 25, No. 1, Winter, pp. 55-89.
Sorensen, Elaine. 1989. "Measuring the Pay Disparity Between Typically Female Occupations and Other Jobs: A Bivariate Selectivity Approach." Industrial and Labor Relations Review, Vol. 42, No. 4, July, pp. 624-39.
Trejo, Stephen J. 1997. "Why Do Mexican Americans Earn Low Wages?" Journal of Political Economy, Vol. 105, No. 6, December, pp. 1235-68.
Watts, Martin J. 1995. "Trends in Occupational Segregation by Race and Gender in the U.S. A., 1983-92: A Multidimensional Approach." Review of Radical Political Economics, Vol. 27, No. 4, pp. 1-36.
| All matched
workers and establishments (1) |
Only establishments
with less than 25
employees (2) |
Only establishments
with 25 or more
employees (3) |
Only establishments with 25 or more
employees and more than 5%
of the workforce matched (4) | |
| SSEL worker earnings (1) | 17,907.45
(38.084) |
16,623.47
(49.678) |
20,666.42
(52.499) |
20,890.92
(63.426) |
| SDF worker earnings (2) | 19,399.10
(48.575) |
18,648.26
(64.315) |
21,012.46
(64.944) |
20,923.20
(63.200) |
| Log difference (across establishments) (3) | 0.027
(0.002) |
0.042
(0.003) |
-0.006
(0.003) |
0.015
(0.003) |
| (SSEL worker earnings, SDF (4)
worker earnings) |
0.246
(0.0001) |
0.196
(0.0001) |
0.436
(0.0001) |
0.536
(0.0001) |
| Mean total employment in (5) establishments | 72.395
(5.645) |
7.760
(0.018) |
211.279
(17.757) |
179.412
(2.654) |
| Mean proportion of workers matched (6)
to the establishment |
0.303
(0.001) |
0.393
(0.001) |
0.111
(0.001) |
0.150
(0.001) |
| Number of establishments (7) | 153,291 | 104,608 | 48,683 | 33,257 |
| Number of Workers (8) | 1,056,635 | 205,506 | 851,129 | 777,515 |
Note: The numbers in parentheses are standard errors of means except for row (4) where they are p-values.
Table 2: Comparing Matched Establishment and Worker Data by Size, Industry and Location For All Workers
|
|
SSEL worker
earnings
(1) |
SDF worker
earnings
(2) |
Log
difference
(3) |
(SSEL earnings,
SDF earnings) (4) |
Proportion
matched
(5) |
Number of
establishments
(6) |
| A. Location | ||||||
| MSA | 19,291.25
(52.934) |
21,125.38
(68.647) |
0.033
(0.003) |
0.234
(0.0001) |
0.287
(0.001) |
92,701 |
| Non-MSA | 15,790.28
(51.014) |
16,754.94
(62.304) |
0.018
(0.004) |
0.244
(0.0001) |
0.328
(0.001) |
60,590 |
| B. Establishment Size (total employment) | ||||||
| 1-9 | 15,895.80
(63.780) |
18,182.98
(79.468) |
0.064
(0.004) |
0.173
(0.0001) |
0.486
(0.001) |
72,123 |
| 10-24 | 18,239.03
(73.646) |
19,681.29
(108.243) |
-0.006
(0.005) |
0.264
(0.0001) |
0.185
(0.001) |
32,485 |
| 25-49 | 20,074.26
(101.084) |
20,842.45
(132.199) |
-0.016
(0.006) |
0.341
(0.0001) |
0.128
(0.001) |
16,465 |
| 50-99 | 20,017.53
(100.619) |
20,238.06
(121.329) |
-0.015
(0.005) |
0.472
(0.0001) |
0.110
(0.001) |
12,814 |
| 100-249 | 20,232.25
(95.700) |
20,428.47
(123.345) |
0.001
(0.005) |
0.489
(0.0001) |
0.100
(0.001) |
11,435 |
| 250-499 | 22,445.05
(146.306) |
22,138.75
(157.877) |
-0.004
(0.006) |
0.577
(0.0001) |
0.098
(0.001) |
4,293 |
| 500 + | 24,854.06
(175.640) |
24,974.70
(153.385) |
0.037
(0.007) |
0.725
(0.0001) |
0.085
(0.001) |
3,676 |
| C. Industry | ||||||
| Agriculture | 14,676.60
(200.409) |
16,770.58
(321.435) |
0.030
(0.017) |
0.105
(0.0001) |
0.408
(0.005) |
4,471 |
| Mining | 27,763.07
(458.391) |
26,314.81
(517.500) |
-0.010
(0.022) |
0.161
(0.0001) |
0.310
(0.008) |
1,556 |
| Construction | 19,034.37
(1,144.77) |
20,459.74
(1,118.01) |
0.152
(0.070) |
0.125
(0.125) |
0.459
(0.027) |
151 |
| Manufacturing | 21,580.73
(70.690) |
22,305.45
(87.278) |
-0.002
(0.004) |
0.216
(0.0001) |
0.215
(0.001) |
40,305 |
| Transportation | 24,340.41
(134.033) |
23,907.14
(143.026) |
0.015
(0.007) |
0.301
(0.0001) |
0.298
(0.002) |
14,529 |
| Wholesale | 22,306.55
(165.696) |
23,127.56
(207.965) |
-0.027
(0.008) |
0.167
(0.0001) |
0.318
(0.003) |
13,370 |
| Retail | 12,377.28
(52.089) |
14,758.91
(97.263) |
0.023
(0.006) |
0.231
(0.0001) |
0.340
(0.002) |
30,127 |
| FIRE | 19,505.58
(219.599) |
21,590.08
(279.916) |
0.029
(0.012) |
0.191
(0.0001) |
0.359
(0.004) |
5,327 |
| Services
|
14,609.83
(69.356) |
17,016.58
(90.963) |
0.077
(0.005) |
0.194
(0.0001) |
0.338
(0.001) |
43,455 |
Note: The numbers in parentheses are standard errors of means except for column (5) where they are p-values.
Table 3: Number, Proportion and Average Total Employment of SSEL, Unique, and Matched Establishments
By Employment Size, Industry, and MSA status
|
SSEL estab. (1) |
Unique estab. (2) |
NWECD estab. (3) |
Proportion unique (4) |
Proportion matched (5) |
SSEL
estab.
empl. (6) |
Unique
estab.
empl. (7) |
NWECD
estab.
empl. (8) | |
| A. Total | ||||||||
| 5,587,650 | 385,135 | 153,291 | 0.069 | 0.027 | 20.11 | 35.82 | 72.39 | |
| B. Location | ||||||||
| MSA | 4,492,867 | 239,020 | 92,701 | 0.053 | 0.021 | 21.35 | 40.56 | 83.74 |
| Non-MSA | 1,091,700 | 146,085 | 60,590 | 0.134 | 0.051 | 14.94 | 28.06 | 55.03 |
| C. Establishment Size (Total Employment) | ||||||||
| 1-9 | 3,955,604 | 255,041 | 72,123 | 0.064 | 0.018 | 3.57 | 3.62 | 4.34 |
| 10-24 | 943,383 | 64,210 | 32,485 | 0.068 | 0.034 | 14.91 | 15.01 | 15.35 |
| 25-49 | 351,123 | 25,806 | 16,465 | 0.073 | 0.047 | 34.28 | 34.55 | 34.85 |
| 50-99 | 182,558 | 17,366 | 12,814 | 0.095 | 0.070 | 68.72 | 70.18 | 70.62 |
| 100-249 | 106,274 | 13,794 | 11,435 | 0.130 | 0.108 | 150.35 | 152.98 | 154.42 |
| 250-499 | 28,807 | 4,887 | 4,293 | 0.170 | 0.149 | 342.67 | 348.53 | 350.03 |
| 500+ | 19,901 | 4,031 | 3,676 | 0.203 | 0.185 | 1,696.61 | 1,484.68 | 1,506.65 |
| D. Industry | ||||||||
| Agriculture | 84,084 | 13,227 | 4,471 | 0.157 | 0.053 | 39.326 | 9.40 | 15.61 |
| Mining | 26,923 | 3,507 | 1,556 | 0.130 | 0.058 | 27.07 | 23.09 | 39.29 |
| Construction | 460,300 | 648 | 151 | 0.001 | 0.0003 | 11.14 | 6.11 | 12.03 |
| Manufacturing | 339,039 | 77,456 | 40,305 | 0.228 | 0.119 | 56.98 | 70.63 | 115.42 |
| Transportation | 206,078 | 28,839 | 14,529 | 0.140 | 0.071 | 34.28 | 60.58 | 107.35 |
| Wholesale | 427,506 | 41,098 | 13,370 | 0.096 | 0.031 | 15.92 | 11.40 | 18.20 |
| Retail | 1,329,908 | 83,592 | 30,127 | 0.063 | 0.023 | 14.76 | 11.04 | 18.83 |
| FIRE | 484,119 | 14,471 | 5,327 | 0.030 | 0.011 | 19.54 | 11.35 | 17.95 |
| Services | 1,779,285 | 122,297 | 43,455 | 0.069 | 0.024 | 19.63 | 39.35 | 88.53 |
Note: There are 450,408 SSEL establishments that have missing or non-classifiable SIC codes. There are 3,086 establishments in the SSEL and 30 establishments in unique industry place cells that we could not assign to an MSA because of incomplete geographic information in the SSEL.
Table 4: Number and Mean Earnings of SDF and NWECD Workers By Industry and Location
|
|
Number of SDF workers (1) |
Number of
NWECD workers (2) |
Proportion matched (3) |
Mean
earnings SDF workers (4) |
Mean earnings
NWECD
workers (5) |
Log difference (6) |
| A. Total | ||||||
| 14,264,082 | 1,056,635 | 0.074 | 23,147.38
(7.749) |
22,438.53
(22.031) |
0.031 | |
| B. Location | ||||||
| MSA | 10,751,733 | 616,994 | 0.057 | 24,932.18
(9.633) |
24,692.94
(31.789) |
0.010 |
| Non-MSA | 3,512,349 | 439,641 | 0.125 | 17,683.87
(10.461) |
19,274.68
(27.826) |
-0.086 |
| C. Industry | ||||||
| Agriculture | 333,628 | 12,002 | 0.036 | 16,069.73
(42.599) |
16,966.45
(211.042) |
-0.054 |
| Mining | 114,367 | 7,374 | 0.064 | 32,137.77
(86.359) |
29,991.61
(260.715) |
0.069 |
| Construction | 879,065 | 477 | 0.001 | 25,102.05
(29.596) |
19,711.87
(1,003.29) |
0.242 |
| Manufacturing | 2,933,974 | 441,810 | 0.151 | 26,730.98
(15.880) |
25,468.95
(34.020) |
0.048 |
| Transportation | 1,095,901 | 71,909 | 0.066 | 28,508.97
(22.747) |
28,564.70
(82.533) |
-0.002 |
| Wholesale | 668,366 | 30,721 | 0.046 | 28,277.24
(41.729) |
23,718.03
(160.149) |
0.176 |
| Retail | 2,471,348 | 88,067 | 0.036 | 14,837.84
(13.580) |
14,205.60
(67.367) |
0.044 |
| FIRE | 1,001,985 | 14,491 | 0.014 | 29,094.73
(43.285) |
20,979.85
(204.333) |
0.327 |
| Services | 4,765,448
|
389,784 | 0.082 | 21,966.55
(13.814) |
19,714.97
(34.966) |
0.108 |
Note: The numbers in parentheses are standard errors of means.
Table 5: Comparing the Characteristics of SDF, and NWECD Workers
|
All Workers |
Workers earning between $2.50 and $500/hr. | Full-time workers earning between $2.50 and $500/hr. | ||||
| SDF
(1) |
NWECD
(2) |
SDF
(3) |
NWECD
(4) |
SDF
(5) |
NWECD
(6) | |
| Female | 0.465 | 0.487 | 0.462 | 0.484 | 0.428 | 0.448 |
| Non-Hispanic white | 0.861 | 0.895 | 0.862 | 0.896 | 0.863 | 0.895 |
| Black | 0.077 | 0.066 | 0.076 | 0.066 | 0.077 | 0.067 |
| Hispanic | 0.064 | 0.038 | 0.063 | 0.038 | 0.062 | 0.037 |
| Ever married | 0.761 | 0.813 | 0.768 | 0.819 | 0.803 | 0.848 |
| Full-time workers | 0.772 | 0.809 | 0.784 | 0.818 | 1.000 | 1.000 |
| Occupation | ||||||
| Manager | 0.259 | 0.257 | 0.264 | 0.260 | 0.281 | 0.263 |
| Support | 0.302 | 0.242 | 0.303 | 0.240 | 0.293 | 0.223 |
| Service | 0.120 | 0.096 | 0.114 | 0.094 | 0.088 | 0.079 |
| Farming | 0.021 | 0.009 | 0.019 | 0.009 | 0.016 | 0.007 |
| Production | 0.118 | 0.127 | 0.119 | 0.129 | 0.135 | 0.146 |
| Laborer | 0.167 | 0.239 | 0.167 | 0.239 | 0.173 | 0.253 |
| Industry | ||||||
| Agriculture | 0.023 | 0.011 | 0.022 | 0.011 | 0.018 | 0.009 |
| Mining | 0.008 | 0.007 | 0.008 | 0.007 | 0.009 | 0.008 |
| Construction | 0.062 | 0.0005 | 0.062 | 0.0004 | 0.067 | 0.00040 |
| Manufacturing | 0.206 | 0.418 | 0.209 | 0.423 | 0.239 | 0.470 |
| Transportation | 0.077 | 0.068 | 0.078 | 0.069 | 0.085 | 0.075 |
| Wholesale | 0.047 | 0.029 | 0.048 | 0.029 | 0.052 | 0.030 |
| Retail | 0.173 | 0.083 | 0.169 | 0.081 | 0.139 | 0.065 |
| FIRE | 0.070 | 0.014 | 0.071 | 0.014 | 0.076 | 0.014 |
| Services | 0.334 | 0.369 | 0.333 | 0.366 | 0.314 | 0.329 |
| Region | ||||||
| Northeast | 0.209 | 0.187 | 0.211 | 0.188 | 0.210 | 0.186 |
| Midwest | 0.290 | 0.383 | 0.289 | 0.383 | 0.287 | 0.382 |
| South | 0.315 | 0.306 | 0.314 | 0.304 | 0.319 | 0.312 |
| West | 0.186 | 0.124 | 0.186 | 0.124 | 0.184 | 0.121 |
|
Table 5: Continued | ||||||
|
All workers |
Workers earning between
$2.50 and $500/hr. |
Full-time workers earning between $2.50 and $500/hr. | ||||
| SDF
(1) |
NWECD
(2) |
SDF
(3) |
NWECD
(4) |
SDF
(5) |
NWECD
(6) | |
| Education | ||||||
| No high school | 0.041 | 0.037 | 0.040 | 0.036 | 0.037 | 0.035 |
| Some high school | 0.125 | 0.115 | 0.120 | 0.112 | 0.099 | 0.102 |
| High school degree | 0.316 | 0.355 | 0.317 | 0.356 | 0.325 | 0.370 |
| Some college | 0.219 | 0.205 | 0.219 | 0.204 | 0.216 | 0.197 |
| Associate's degree | 0.074 | 0.090 | 0.075 | 0.090 | 0.078 | 0.090 |
| Bachelor's degree | 0.147 | 0.125 | 0.150 | 0.127 | 0.159 | 0.128 |
| Advanced degree | 0.077 | 0.073 | 0.079 | 0.074 | 0.085 | 0.077 |
| Speaks English : | ||||||
| Very well | 0.955 | 0.974 | 0.955 | 0.974 | 0.956 | 0.975 |
| Well | 0.025 | 0.016 | 0.025 | 0.016 | 0.024 | 0.015 |
| Not well | 0.016 | 0.008 | 0.015 | 0.008 | 0.015 | 0.008 |
| Not at all | 0.005 | 0.002 | 0.004 | 0.002 | 0.004 | 0.002 |
| Citizenship status : | ||||||
| Citizen by birth in U.S. | 0.921 | 0.955 | 0.921 | 0.954 | 0.921 | 0.955 |
| Citizen by birth in U.S. territory | 0.004 | 0.002 | 0.004 | 0.002 | 0.004 | 0.002 |
| Citizen by naturalization | 0.033 | 0.021 | 0.033 | 0.021 | 0.034 | 0.021 |
| Not a citizen | 0.043 | 0.022 | 0.043 | 0.022 | 0.042 | 0.021 |
| Mean age | 38.016
(0.003) |
38.887
(0.012) |
38.166
(0.003) |
39.026
(0.012) |
38.440
(0.003) |
39.360
(0.012) |
| Mean number of weeks worked | 46.359
(0.003) |
47.286
(0.010) |
46.650
(0.003) |
47.521
(0.010) |
49.988
(0.001) |
50.271
(0.005) |
| Mean usual hours worked per week | 39.586
(0.003) |
39.624
(0.011) |
39.638
(0.003) |
39.680
(0.010) |
42.261
(0.002) |
41.945
(0.006) |
| Mean wage or salary income | 23,147.38
(7.749) |
22,438.53
(22.031) |
23,764.47
(7.790) |
22,875.89
(22.000) |
27,259.79
(8.896) |
25,611.50
(23.884) |
| Mean hourly wage
|
12.617
(0.023) |
12.012
(0.029) |
12.545
(0.004) |
11.983
(0.013) |
12.655
(0.004) |
11.977
(0.010) |
| Number of workers | 14,264,082 | 1,056,635 | 13,817,006 | 1,032,462 | 10,830,247 | 845,020 |
Note: The numbers in parentheses are the standard errors of means. The reference period for number of weeks worked, usual hours worked per week, wage or salary income, and hourly wage is the previous year (1989). Hourly wage is estimated as: (wage or salary income/number of weeks worked)/usual hours worked per week. Citizen by birth in U.S. also includes individuals born outside of the U.S. to American parents.
Table 6: Regressions of Worker Wages for SDF and WECD Workers
| SDF workers | NWECD workers | ||
| (1) | (2) | (3) | |
| Female | -0.112
(0.002) |
-0.112
(0.002) |
-0.096
(0.002) |
| Age | 0.046
(0.0003) |
0.046
(0.0003) |
0.044
(0.0003) |
| Age2 x 100 | -0.045
(0.0003) |
-0.045
(0.0003) |
-0.042
(0.0003) |
| Ever married | 0.190
(0.002) |
0.191
(0.002) |
0.198
(0.002) |
| Black | -0.072
(0.003) |
-0.073
(0.003) |
-0.067
(0.003) |
| Hispanic | -0.083
(0.003) |
-0.083
(0.003) |
-0.072
(0.004) |
| High school degree | 0.102
(0.002) |
0.103
(0.002) |
0.103
(0.002) |
| Some college | 0.169
(0.002) |
0.169
(0.002) |
0.159
(0.002) |
| Associate's degree | 0.213
(0.002) |
0.213
(0.002) |
0.240
(0.002) |
| Bachelor's degree | 0.372
(0.002) |
0.371
(0.002) |
0.359
(0.002) |
| Advanced degree | 0.546
(0.003) |
0.545
(0.003) |
0.534
(0.003) |
| Black × female | 0.101
(0.004) |
0.101
(0.004) |
0.063
(0.004) |
| Hispanic x female | 0.077
(0.004) |
0.077
(0.004) |
0.062
(0.004) |
| Ever married × female | -0.213
(0.002) |
-0.213
(0.002) |
-0.209
(0.003) |
| Match | ... | -0.028
(0.002) |
... |
| R2 | 0.350 | 0.350 | 0.389 |
| Number obs. | 1,426,748 | 1,426,748 | 1,056,635 |
Note: The regressions in columns (1) and (2) are based on a 10-percent random sample from the SDF. These regressions also include controls for region, occupation and Census industry.
| SSEL establishments | Unique establishments | NWECD | |||
| (1) | (2) | (3) | (4) | (5) | |
| Log employment | 0.054
(0.0003) |
0.055
(0.0003) |
0.059
(0.001) |
0.054
(0.001) |
0.065
(0.001) |
| Place | 0.086
(0.001) |
0.086
(0.001) |
0.129
(0.003) |
0.128
(0.003) |
0.107
(0.004) |
| Multi-unit | 0.285
(0.001) |
0.284
(0.001) |
0.306
(0.003) |
0.304
(0.003) |
0.249
(0.004) |
| Match | ... | -0.072
(0.002) |
... | 0.045
(0.003) |
... |
| Establishment age |
| ||||
| 0-4 | -0.185
(0.001) |
-0.185
(0.001) |
-0.184
(0.004) |
-0.182
(0.004) |
-0.191
(0.005) |
| 5-9 | -0.083
(0.001) |
-0.083
(0.001) |
-0.083
(0.003) |
-0.082
(0.003) |
-0.104
(0.005) |
| 10-14 | -0.038
(0.001) |
-0.038
(0.001) |
-0.048
(0.004) |
-0.048
(0.004) |
-0.075
(0.005) |
| Region |
| ||||
| Northeast | 0.055
(0.001) |
0.056
(0.001) |
0.097
(0.004) |
0.101
(0.004) |
0.063
(0.005) |
| Midwest | -0.119
(0.001) |
-0.117
(0.001) |
-0.079
(0.004) |
-0.080
(0.004) |
-0.065
(0.005) |
| South | -0.110
(0.001) |
-0.117
(0.001) |
-0.110
(0.004) |
-0.111
(0.004) |
-0.101
(0.005) |
| R2 | 0.295 | 0.295 | 0.324 | 0.325 | 0.401 |
| Number | 5,584,172 | 5,584,172 | 385,134 | 385,134 | 153,291 |
Note: All regressions include controls for four-digit industry. The omitted establishment age category is 15 or more years, and the omitted region is West.
__________________________________________________________________________________________________________
Females Males Black Hispanic White Black Hispanic White
(1) (2) (3) (4) (5) (6)
Log hourly wages 2.010 2.038 2.167 2.315 2.306 2.547
(0.504) (0.499) (0.489) (0.529) (0.533) (0.509)
Age 38.711 37.446 39.557 38.908 37.183 39.892
(10.513) (10.727) (11.029) (10.740) (11.141) (10.963)
Number of children 2.265 2.104 1.793 ... ... ...
(1.943) (1.854) (1.574)
Ever married 0.749 0.818 0.860 0.789 0.817 0.867
No high school degree 0.218 0.312 0.108 0.248 0.404 0.127
High school degree 0.388 0.294 0.358 0.398 0.273 0.381
Some college 0.202 0.183 0.189 0.196 0.164 0.192
Associate's degree 0.077 0.090 0.126 0.054 0.054 0.073
Bachelor's degree 0.073 0.078 0.136 0.065 0.060 0.134
Advanced degree 0.042 0.043 0.082 0.038 0.046 0.093
Speaks English :
Very well 0.988 0.726 0.987 0.988 0.646 0.988
Well 0.007 0.153 0.009 0.008 0.196 0.008
Not well 0.004 0.089 0.004 0.004 0.123 0.003
Not at all 0.0004 0.032 0.0003 0.0004 0.034 0.0002
Citizenship status:
By birth in U.S. 0.969 0.626 0.969 0.968 0.537 0.973
By birth in U.S. territory 0.001 0.056 0.0002 0.001 0.064 0.0003
By naturalization 0.014 0.139 0.018 0.012 0.137 0.016
Not a citizen 0.016 0.179 0.012 0.019 0.261 0.011
MSA 0.536 0.713 0.544 0.601 0.741 0.591
Establishment size:
25-49 0.051 0.085 0.067 0.079 0.132 0.097
50-99 0.098 0.118 0.110 0.105 0.143 0.116
100-249 0.195 0.228 0.217 0.183 0.245 0.192
250-499 0.199 0.197 0.197 0.168 0.137 0.165
500-999 0.196 0.165 0.175 0.168 0.119 0.156
1000+ 0.260 0.208 0.233 0.298 0.223 0.274
N 24,525 9,105 265,047 19,927 12,380 306,734
________________________________________________________________________________________________________
Note: Standard deviations are reported in parentheses for continuous variables. Number of children refers to number of children ever born; this is asked of women in the Census, and is set to zero for men. "White" refers to non-black and non-Hispanic.
____________________________________________________________________________________________________________________________________
Females Males
(1) (2) (3) (4) (5) (6) (7) (8)
Black -0.157 -0.022 -0.023 -0.037 -0.232 -0.121 -0.122 -0.114
(0.003) (0.003) (0.003) (0.003) (0.004) (0.003) (0.003) (0.003)
Hispanic -0.129 -0.045 -0.023 -0.025 -0.241 -0.115 -0.068 -0.051
(0.005) (0.004) (0.005) (0.004) (0.005) (0.004) (0.004) (0.004)
Age ... 0.048 0.048 0.041 ... 0.065 0.065 0.053
(0.001) (0.001) (0.0005) (0.0005) (0.0005) (0.0005)
Age2/100 ... -0.048 -0.048 -0.039 ... -0.063 -0.063 -0.049
(0.001) (0.001) (0.001) (0.001) (0.001) (0.001)
Number of children ... -0.028 -0.028 -0.015 ... ... ... ...
(0.002) (0.002) (0.002)
(Age/10) × number of children ... 0.0002 0.0001 -0.001 ... ... ... ...
(0.0005) (0.0005) (0.0005)
Ever married ... 0.063 0.062 0.050 ... 0.196 0.195 0.157
(0.002) (0.002) (0.002) (0.002) (0.002) (0.002)
High school degree ... 0.115 0.110 0.088 ... 0.180 0.172 0.131
(0.002) (0.002) (0.002) (0.002) (0.002) (0.002)
Some college ... 0.233 0.228 0.193 ... 0.254 0.246 0.207
(0.003) (0.003) (0.003) (0.003) (0.003) (0.002)
Associate's degree ... 0.451 0.446 0.408 ... 0.297 0.289 0.255
(0.003) (0.003) (0.003) (0.003) (0.003) (0.003)
Bachelor's degree ... 0.583 0.578 0.552 ... 0.481 0.473 0.470
(0.003) (0.003) (0.003) (0.003) (0.003) (0.003)
Advanced degree ... 0.732 0.727 0.740 ... 0.593 0.585 0.709
(0.003) (0.003) (0.004) (0.003) (0.003) (0.003)
Speaks English very well ... ... 0.189 0.166 ... ... 0.302 0.251
(0.021) (0.020) (0.019) (0.018)
Speaks English well ... ... 0.136 0.119 ... ... 0.218 0.185
(0.022) (0.021) (0.020) (0.018)
Speaks English not well ... ... 0.088 0.078 ... ... 0.129 0.113
(0.023) (0.021) (0.020) (0.019)
Citizen by birth in U.S. ... ... 0.041 0.027 ... ... 0.041 0.020
(0.006) (0.006) (0.006) (0.005)
Citizen by birth in U.S. territory ... ... 0.045 0.031 ... ... 0.019 -0.009
(0.017) (0.016) (0.015) (0.014)
Citizen by naturalization ... ... 0.084 0.061 ... ... 0.113 0.076
(0.008) (0.007) (0.007) (0.007)
MSA ... 0.135 0.135 0.093 ... 0.133 0.133 0.087
(0.002) (0.002) (0.002) (0.002) (0.002) (0.002)
Nine region controls included: No Yes Yes Yes No Yes Yes Yes
Size controls included: No No No Yes No No No Yes
4-digit industry controls
included: No No No Yes No No No Yes
N 298,677 298,677 298,677 298,677 339,041 339,041 339,041 339,041
R2 0.009 0.352 0.353 0.441 0.018 0.351 0.353 0.458
________________________________________________________________________________________________________________________________________
Note: Standard errors of regression estimates are reported in parentheses. The omitted category for English fluency is "does not speak English," and the omitted category for citizenship is "not a citizen."
_________________________________________________________________________________________________________________________________________________________________
Females Males
Contribution to Contribution to Contribution to Contribution to
Regression Mean difference, black - white Mean difference, Hispanic - white Regression Mean difference, black - white Mean difference, Hispanic - white
estimates black - white wage gap Hispanic - white wage gap estimates black - white wage gap Hispanic - white wage gap
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
Black -0.012 1.000 -0.012 0.000 0.000 -0.073 1.000 -0.073 0.000 0.000
(0.004) (0.004)
Hispanic -0.016 0.000 0.000 1.000 -0.016 -0.029 0.000 0.000 1.000 -0.028
(0.006) (0.006)
% black in occupation -1.012 0.018 -0.018 0.009 -0.009 -2.422 0.015 -0.036 0.010 -0.024
(0.049) (0.041)
% black in industry 0.092 0.015 0.001 0.001 0.000 -0.505 0.013 -0.006 0.004 -0.002
(0.022) (0.023)
% black in establishment 0.214 0.292 0.063 -0.000 -0.000 0.154 0.224 0.035 -0.002 -0.000
(0.009) (0.010)
% black in job cell -0.114 0.443 -0.050 0.008 -0.001 -0.093 0.429 -0.040 0.004 -0.000
(0.008) (0.008)
Segregation by race ... ... -0.004 ... -0.010 ... ... -0.047 ... -0.026
% Hispanic in occupation -0.851 0.012 -0.011 0.007 -0.006 -0.928 0.014 -0.013 0.011 -0.010
(0.071) (0.045)
% Hispanic in industry -0.421 0.004 -0.002 0.007 -0.003 -1.887 0.003 -0.006 0.012 -0.023
(0.033) (0.025)
% Hispanic in establishment 0.169 0.001 0.000 0.316 0.053 0.061 0.003 0.000 0.328 0.020
(0.014) (0.011)
% Hispanic in job cell -0.124 0.004 -0.000 0.459 -0.057 -0.091 0.006 -0.001 0.526 -0.048
(0.012) (0.010)
Segregation by Hispanic ... ... -0.013 ... -0.013 ... ... -0.020 ... -0.061
ethnicity
R2 0.363 ... ... ... ... 0.394 ... ... ... ...
_________________________________________________________________________________________________________________________________________________________________
Note: Standard errors of regression estimates are reported in parentheses. The control variables included correspond to those in columns (3) and (7) in Table 9. See notes to Tables 8 and 9. The percentages black and Hispanic are computed for men and women separately.
_________________________________________________________________________________________________________________________________________________________________
Females Males
Contribution to Contribution to Contribution to Contribution to
Regression Mean difference, black - white Mean difference, Hispanic - white Regression Mean difference, black - white Mean difference, Hispanic - white
estimates black - white wage gap Hispanic - white wage gap estimates black - white wage gap Hispanic - white wage gap
(1) (2) (3) (4) (5) (6) (7) (8) (9) (10)
Black -0.009 1.000 -0.009 0.000 0.000 -0.064 1.000 -0.064 0.000 0.000
(0.004) (0.004)
Hispanic -0.010 0.000 0.000 1.000 -0.010 -0.014 0.000 0.000 1.000 -0.014
(0.007) (0.006)
% black in occupation -1.323 0.027 -0.036 0.014 -0.019 -2.513 0.020 -0.049 0.013 -0.032
(0.023) (0.029)
% black in industry 0.385 0.015 0.006 0.001 0.000 -0.140 0.013 -0.002 0.004 -0.001
(0.022) (0.023)
% black in establishment 0.140 0.292 0.041 -0.000 -0.000 0.120 0.224 0.027 -0.002 -0.000
(0.008) (0.009)
% black in job cell -0.041 0.548 -0.022 0.005 -0.000 -0.069 0.549 -0.038 0.003 -0.000
(0.007) (0.007)
Segregation by race ... ... -0.011 ... -0.019 ... ... -0.062 ... -0.033
% Hispanic in occupation -0.562 0.015 -0.008 0.011 -0.006 -0.448 0.016 -0.007 0.015 -0.007
(0.041) (0.033)
% Hispanic in industry -0.434 0.004 -0.002 0.007 -0.003 -1.860 0.003 -0.006 0.012 -0.023
(0.033) (0.025)
% Hispanic in establishment 0.103 0.001 0.000 0.316 0.033 0.037 0.003 0.000 0.328 0.012
(0.011) (0.010)
% Hispanic in job cell -0.068 0.003 -0.000 0.578 -0.039 -0.081 0.005 -0.000 0.636 -0.052
(0.010) (0.009)
Segregation by Hispanic ... ... -0.010 ... -0.015 ... ... -0.013 ... -0.070
ethnicity
R2 0.371 ... ... ... ... 0.398 ... ... ... ...
_________________________________________________________________________________________________________________________________________________________________
Note: See notes to Tables 8-10.
________________________________________________________________________________________________________________
Females Males Sex difference in wage gap
Black Hispanic Black Hispanic Black Hispanic
(1) (2) (3) (4) (5) (6)
A. 13 occupations:
Without segregation -0.023 -0.023 -0.122 -0.068 0.099 0.045
controls, coefficients (0.003) (0.005) (0.003) (0.004) (0.004) (0.006)
With segregation -0.012 -0.016 -0.073 -0.029 0.061 0.013
controls, coefficients (0.004) (0.006) (0.004) (0.006) (0.006) (0.008)
Combined segregation effect:
Race -0.004 -0.010 -0.047 -0.026 ... ...
Hispanic ethnicity -0.013 -0.013 -0.020 -0.061 ... ...
With job-cell dummy -0.022 -0.035 -0.074 -0.038 0.052 0.003
variables, coefficients (0.003) (0.005) (0.003) (0.005) (0.004) (0.007)
N 298,677 339,041
B. 72 occupations:
Without segregation -0.023 -0.023 -0.122 -0.068 0.099 0.045
controls, coefficients (0.003) (0.005) (0.003) (0.004) (0.004) (0.006)
With segregation -0.009 -0.010 -0.064 -0.014 0.055 0.004
controls, coefficients (0.004) (0.007) (0.004) (0.006) (0.006) (0.009)
Combined segregation effect:
Race -0.011 -0.019 -0.062 -0.033 ... ...
Hispanic ethnicity -0.010 -0.015 -0.013 -0.070 ... ...
With job-cell dummy -0.020 -0.026 -0.063 -0.023 0.043 -0.003
variables, coefficients (0.003) (0.006) (0.003) (0.005) (0.004) (0.008)
N 298,677 339,041
______________________________________________________________________________________________________________
Note: The other control variables included correspond to those in columns (3) and (7) in Table 9. The standard errors in columns (5) and (6) are calculated assuming independent samples.
1. In more recent data for 1995 (reported in Altonji and Blank, 1998, Table 1) the qualitative pattern of larger racial and ethnic gaps for men than for women is similar, although for women the racial and ethnic gaps have grown, and among men the Hispanic-white earnings ratio has fallen below the black-white earnings ratio; specifically, for full-time, year-round workers, black-white earnings ratios are 0.69 for men and 0.83 for women, while Hispanic-white earnings ratios are 0.58 for men and 0.75 for women.
2. In the same specification, they report that the Hispanic-white wage difference for men falls to essentially zero, while Hispanic women earn 14.5 percent more than white women. While these estimates also preserve the large sex difference in the Hispanic-white difference, we are skeptical of the reliability of these estimates for Hispanics. This might be viewed as a specific formulation of the hypothesis that each group of minorities or women suffers from discrimination relative to white males, while the differences in the effects of discrimination among these minorities or women are relatively minor; in particular, we look at the effects of segregation, which might well arise from discrimination.
4. The differences concern the relative importance of each of these dimensions of segregation, and the role of the individual's sex after accounting for segregation (effectively, the within-job-cell sex difference in wages). The results in Groshen and in Bayard, et al., are directly comparable, and differ in that Groshen attributes a large portion of the sex gap in wages to occupational segregation and none to within-job cell sex differences, whereas Bayard, et al., find a smaller role for occupational segregation, and a larger role for within-job-cell sex differences.
Carrington and Troske (1998a) also document the concentration of women in low-wage plants in U.S. manufacturing.
5. Carrington and Troske (1998b) use the WECD, a version of the NWECD that covers manufacturing only, to look at the role of racial segregation across establishments. They find that in establishments in which blacks are concentrated wages of white workers are relatively high, but also that the wage gap between black and white workers is relatively larger in these establishments. Thus, the overall impact of segregation by establishment on the black-white wage gap is unclear.
6. See Bayard, et al. (1998) for a thorough discussion of this issue in the context of sex differences in wages.
7. In some geographic areas, the Census Bureau uses Block Numbering Areas (BNAs) instead of tracts. For our purposes, a BNA is equivalent to a tract. The Census Bureau assigns tracts and blocks in tandem, so whenever an establishment is assigned a tract code, it is also always assigned a block code.
8. For some establishments in the SSEL the only address information may be the mailing address of the business and not the physical address. In addition, this mailing address may be a P.O. box, which cannot be assigned a tract or block code. In 1992 the Census Bureau assigned tract and block codes to 45 percent of the records in the SSEL.
9. For the most part, CIC codes correspond to three-digit SIC codes. One exception is the construction industry where there is one CIC code which corresponds to three two-digit SIC codes. In addition, there are a few SIC industries which correspond to more than one CIC. We omitted the few establishments that were in these industries.
10. For example, if a worker's block-code is imputed and we use the block-code to match the worker with the employer, then we eliminate the match. However, if a worker's block-code is imputed, but his place-code is not, and we only use the place-code to match the worker to the employer, then we keep the match.
11. In addition, we have eliminated workers with missing or zero reported earnings or who report working outside the U.S.
12. It is possible that this positive relationship between the cross-establishment correlation of the two earnings estimates and establishment size is a result of the fact that the 1990 Decennial Census long-form is sent to a random sample of the population. Because the long-form was sent to one in six households, large establishments will, on average, contain more workers who received the form, and therefore, we will have a more accurate estimate of the "true" average earnings of workers in the plant which in turn will result in a higher correlation between the SSEL and SDF earnings estimates. To examine this hypothesis, for establishments with 25 or more employees we construct an estimate of SDF worker earnings using a random sample of workers equal to the average number of workers matched to establishments with 10-24 employees (3 workers). Using this estimate, we still find a positive relationship between the cross-establishment correlation of the SSEL and SDF earnings estimates and size.
13. There is some question as to what is the location of a construction establishment. See Calabria (1998) for a further discussion of this issue.
14. In this and subsequent comparisons, we only keep workers in the SDF that meet the same criteria as workers who appear in the NWECD - positive earnings and report working in the U.S.
15. To code race and ethnicity, we began with the question from the Decennial Census asking "Is this person of Spanish/Hispanic origin?" and then asking respondents to indicate specific ethnicity (e.g., Mexican, Cuban, other). We code the individual as Hispanic if the answer to the "Spanish/Hispanic" question is yes and the person is not black. Additionally, we code the worker as Hispanic if he or she lists a Latin American race code under the separate "Race" question (and also is not black). The "Race" question asks respondents to indicate whether they self-identify with one of seven race groups: white, black, Indian, Eskimo, Aleut, Asian or Pacific Islander (several choices), or Other (in which case they are then asked to indicate the race with which they identify). We code workers as black if they meet one of two conditions: they pick "black" on the race question; or they pick "other" and indicate a race that falls into a "black" category (e.g., African American, Afro-American). Workers cannot be coded as both black and Hispanic in our sample. For example, if a worker answers the "Race" question as "other, Cuban" then the worker is coded as Hispanic. But if the worker answers the "Race" question as "Black" but indicates Hispanic-Cuban ethnicity, then the worker is coded as Black.
16. This decomposition can be thought of as the traditional decomposition of Oaxaca (1973), imposing the restriction that the coefficients are the same for racial and ethnic groups.
17. These 13 occupational categories, the corresponding Census codes, and the number of subcategories making up the 72 disaggregated occupations, are as follows:
(1) Managerial and Professional Specialty Occupations - Executive, Administrative, and Managerial Occupations, codes 3-37, 2 subcategories
(2) Managerial and Professional Specialty Occupations - Professional Specialty Occupations codes 43-199, 9 subcategories
(3) Technical Sales, and Administrative Support Occupations - Technicians and Related Support Occupations, codes 203-235, 3 subcategories
(4) Technical Sales, and Administrative Support Occupations - Sales Occupations, codes 243-285, 3 subcategories
(5) Technical Sales, and Administrative Support Occupations - Administrative Support Occupations, codes 303-389, 10 subcategories
(6) Service Occupations - Private Household Occupations, codes 403-407, 1 subcategory
(7) Service Occupations - Protective Service Occupations, codes 413-427, 3 subcategories
(8) Service Occupations - Service Occupations, Except Protective and Household, codes 433-469, 7 subcategories
(9) Farming, Forestry, and Fishing Occupations, codes 473-499, 4 subcategories
(10) Precision Production, Craft, and Repair Occupations, codes 503-699, 14 subcategories
(11) Operators, Fabricators, and Laborers - Machine Operators, codes 703-799, 7 subcategories
(12) Operators, Fabricators, and Laborers - Transportation and Material Moving Occupations, codes 803-859, 5 subcategories
(13) Operators, Fabricators, and Laborers - Handlers, Equipment Cleaners, Helpers, and Laborers, codes 864-889, 4 subcategories
18. They also absorb the controls for region and MSA.
19. In fact, this is the main motivation for constructing the percent-black and percent-Hispanic variables by sex. When we defined the percent-black and percent-Hispanic variables over men and women together, the results for Hispanics were very similar to those reported below, while there was less evidence that inclusion of the segregation variables accounted for much of the black-white wage gap among men.
20. In contrast, in Bayard, et al. (1998), we report much sharper sex segregation, with, for example, a mean difference by occupation of 0.17.
21. The estimated within-job-cell wage gaps are similar and if anything somewhat larger (in absolute value) using the fixed effects, implying that these estimates were not biased away from zero owing to measurement error in the segregation variables.