%Paper: ewp-em/9506004
%From: Eric Rasmusen <erasmuse@rasmusen.bus.indiana.edu>
%Date: Wed, 14 Jun 95 13:37:42 -0500
%Date (revised): Fri, 16 Jun 95 16:35:51 -0500

  %   December 5, 1992 

 % MInor style changes, may 13 1993, juyly 8
           \documentstyle[12pt,epsf] {article}
\parskip 10pt

         \begin{document} 

 
\parindent 24pt 

 
         \titlepage 

 
	      \vspace*{12pt}
 

         \begin{center} 

\begin{large} 

         {\bf Observed Choice and Optimism in Estimating the Effects 

of  Government Policies    }\\ 

  \end{large} 

         \vskip 20pt 

        August 4, 1994  \\ 

        \bigskip 

 Eric Rasmusen\\ 

        \vskip .7in 

        {\it Abstract} 

        \end{center} 

 
 A policy will be used more heavily in a particular time and place 
where 

its marginal cost is lower. The analyst who treats times and 

places as identical will overestimate the policy's net benefit, 

especially for policy intensities greater than exist in his sample. 

In regression analysis,  the problem can be solved by instrumental 
variables 

and a correction for heteroskedasticity. In an example   using 
state-level data,  the technique substantially increases the 
estimated responsiveness of  the  illegitimacy rate  to  transfer 
payments. 

 
            \vskip .3in
\begin{small}
          \noindent 

\hspace*{20pt}	  	  Indiana University
School of Business, Rm. 456, 

  10th Street  and Fee Lane,
  Bloomington, Indiana, 47405-1701.
  Office: (812) 855-9219.   Fax: 812-855-3354. Internet: 
Erasmuse@indiana.edu.\\ 

 JEL numbers and Keywords: C1, C3, C5, H3,  I3.  Estimation bias. 
Poverty. Political economy. Instrumental variables. 

 	%   Draft: 7.4 (Draft 1.1, May 1990). 

 % \footnote{xxx Footnotes starting with xxx are the author's notes 
to 

% himself.} 

 
I would like to thank Robert Barsky, Trudy Cameron,  John Garen, 
Hashem Pesaran, Simon Potter,  Sunil Sharma, Hal Varian,  and seminar 
participants at Indiana University, the University of Michigan, the 
University of Rochester, and the Wharton School  for   comments. 
Much of this work was completed while the author was an Olin   Fellow 
at the Center for the Study of the Economy and the State, University 
of Chicago, and on the faculty of UCLA's Anderson Graduate School of 
Management. 

   \end{small} 

 
  %---------------------------------------------------------------% 
%--------------------------------------------------------------- 

        \newpage 

 \begin{center} 

{\bf 1. Introduction.} 

 \end{center} 

 
 An important problem is how to judge the effect of a government 
policy by 

looking at data on its   use and  impact in various 

times and places.  The task might be to estimate the effect of 

government transfers on poverty, of unemployment insurance 

on unemployment, or  of  the tax rate on tax revenue.   Let the 
hypothesized relationship be $Impact = \beta Policy$, or 

  \begin{equation} \label{e1} 

 y =   \beta x. 

 \end{equation} 

 
 The  observed-choice problem,  the 

subject of this article, is that very commonly  $x = x(\beta)$, 
because   the 

observed policies are not random. They are chosen in recognition 

of their costs and benefits in particular times and places, so $x$ 

depends on $\beta$, which differs across 

observations. If policies are used more where they are more effective 

on the margin, then both  casual empiricism  and   estimates using 
ordinary least squares, are biased towards 

optimism about the effect of the policies. This is not like typical 
sources of bias  such as omission of relevant variables,     which 
can cause bias in either direction; rather, it is 

like measurement error with one regressor, which generates bias in 

a predictable direction. 

 
     The mathematics of the observed-choice problem are relatively 
simple, relying on the theories of instrumental variables and random 
coefficients that are by now well-established in the econometrics 
literature, though perhaps not in exactly this combination. Nor is 
the idea that individuals make decisions based on costs and benefits 
new; this is the heart of economics. What this paper will contribute 
is the observation that 

if  decisions are made by rational  actors, then   cross-section 
estimation of the effects of government policies will be biased, and 
biased systematically in favor of government activism. 

 
 Section 2 will set up the  estimation problem and the bias that 
results (subsection 2.1),  show the sign of the bias (2.2), devise a 
consistent estimator (2.3), and discuss a different approach 
suggested by Garen (2.4).  Section 3 will explain the problem more 
intuitively (3.1), distinguish it from other econometric problems 
(3.2), discuss related examples with discrete variables or 
nonlinearities (3.3),  and compare the policymaking problem with the 
prediction problem (3.4).  Section 4 will apply the analysis in a 
particular context, the effect  of government transfer payments on 
illegitimacy.   Section 5 concludes. 

 
%--------------------------------------------------------------- 

 
\bigskip 

\begin{center} 

 {\bf 2. The Observed-Choice Problem} 

 \end{center} 

 
\noindent 

 {\bf 2.1. The Model} 

 
\noindent 

 The analyst is trying to estimate relationship (\ref{e2}): 

 \begin{equation} \label{e2} 

 y =  \beta x \; . 

 \end{equation} 

  Each of his $n$ observations consists of an impact level $y$ and a 

policy level $x$ for a particular time and place, subscripted $i$. 

The standard approach is to regress $y$ on $x$ in the belief that the 

true specification is 

   \begin{equation} \label{e3} 

 y_i =    \beta x_i + \epsilon_i, \; 

 \end{equation} 

  where $\epsilon \sim (0, \sigma_\epsilon^2)$.    As always in 
estimation, the analyst does not believe equation 

(\ref{e3}) to be more than an approximation. The true relationship is 

unlikely to be precisely linear, for example, but linearity is a good 

approximation when one does not know whether a convex, concave, or 

wavy function would be appropriate. Similarly, each time and 

place  does not have exactly  the same true coefficient, and a more 

accurate  specification would be equation (\ref{e4}), in which the 

effect of the policy is different for each observation: 

 \begin{equation} \label{e4} 

 y_i =  \beta_i x_i + \epsilon_i \; . 

 \end{equation} 

  Equation (\ref{e4}), however, is impossible to estimate, since it 

has $n$ parameters and there are only $n$ observations.  Moreover, 

using   approximation (\ref{e3}) might not be  misleading, since, in 
the absence 

of other considerations, the regression of $y$ on $x$ does give an 

unbiased estimate of the average $\beta$.  To see this, suppose that 

the true specification for $\beta_i$ in equation (\ref{e4}) is 

 \begin{equation} \label{e5} 

 \beta_i = \overline{\beta} + v_i, 

 \end{equation} 

 where $v \sim (0, \sigma_v^2)$and is independent of $\epsilon$. 
Using 

(\ref{e5}), equation (\ref{e4}) becomes 

 \begin{equation} \label{e6} 

 y_i =   \overline{\beta} x_i + x_i v_i + 

\epsilon_i \; . 

 \end{equation} 

 The ordinary least squares (OLS) estimate of $\overline{\beta}$ is 

 \begin{equation} \label{e7} 

 \widehat{\beta}_{OLS} =  \frac{\sum x_i y_i }{\sum x_i^2}, 

 \end{equation} 

 where $\sum$ will denote $\sum_{i=1}^n$ throughout the paper. 

 If $v_i$ and $x_i$ are independent, the OLS estimate of 

$\overline{\beta}$ is unbiased,   because 

   % \footnote{xxx Unbiased, not just conssitent. See Maddala p. 
151.} 

the expected value of  expression (\ref{e7}) is 

 \begin{equation} \label{e8} 

E \left( \frac{\sum x_i ( \overline{\beta} x_i + v_i x_i + 

\epsilon_i) }{\sum x_i^2} \right)\;, 

 \end{equation} 

 which equals 

 \begin{equation} \label{e9} 

 E \left( \overline{\beta} \frac{\sum x_i^2}{\sum x_i^2} \right) + 

E \left( \frac{\sum x_i^2 v_i}{\sum x_i^2} \right) + E \left( 
\frac{\sum x_i 

\epsilon_i }{\sum x_i^2} \right)\;. 

 \end{equation} 

 The first and last terms of (\ref{e9}) equal $ \overline{\beta}$ and 

0, and the   middle term equals 0 if $E (x_i^2 v_i) = 0$. Thus,  if 
$x_i$ and 

$v_i$ are independent, OLS is unbiased. 

 
 Despite the unbiasedness of $\widehat{\beta}_{OLS}$, 
heteroskedasticity  does make OLS inefficient and 

biases the estimated standard errors. The variance of the error term 
for observation $i$ is $x_i^2 \sigma_u^2 + \sigma_\epsilon^2$, from 
equation   (\ref{e6}), which varies depending on $x_i$. 

  Although $ E (x_i v_i)=0$,     observation $i$'s disturbance 
depends 

on the size of $x_i$. When $x_i$ is large, so is the disturbance, and 

observation $i$ ought to be weighted less heavily in the estimate. 

This ``varying-parameters'' heteroskedasticity is a well-known 

problem, and the estimate can be improved by weighted least 

squares as described below in Section 3.2.\footnote{For textbook 

discussions of varying-parameter models, see pp.  75-89 of Kennedy 

(1985) and pp.  390-393 of Maddala (1977).} 

 
  A greater difficulty is that $v_i$ and $x_i$ are unlikely to be 

independent. After all, why is $x_i$ different from $x_j$?  Policies 

are chosen for many different reasons, but benefits are always 

weighed against costs, and the variable $y$ that the econometrician 

is examining is very likely to be part of either the benefit or the 

cost. Suppose, for example, that $x$ is the level of cigarette 

taxation and $y$ is the amount of deadweight loss.  Deadweight loss 

is a cost, and states where taxes create more deadweight loss will 

choose lower levels of taxation. Or suppose that $x$ is the level of 

cigarette taxation, and $y$, the amount of revenue raised, which 

depends on the potential for smuggling.  Revenue is a benefit, and 

states where cigarette taxes raise more revenue  may choose higher 

levels of taxation. 

 
  The relevance of costs and benefits is robust to the details of why 

the policies are chosen.  If the legislators aim to maximize social 

welfare, it is obvious that they will weigh costs and benefits. But 

even if their primary concern is to please special interest groups 

such as cigarette companies or the beneficiaries of state spending, 

the legislators will still consider the public costs and benefits if 

the general public has any political influence whatsoever (as 

Peltzman [1976] points out).  It may well be that lobbying by 

cigarette companies makes every state set the tax too low from the 

viewpoint of social welfare, but states where the cost of the tax is 

low and the benefit is high will nonetheless have the highest taxes, 

because lobbyists would have to spend more there to obtain a given 

tax reduction. 

 
This logic says that $x_i$ depends on $\beta_i$, and  on other 
factors 

which will be incorporated as an exogenous variable $w$, so a third 

equation, equation (\ref{e21}),  is required to describe the complete 

system: 

     \begin{equation} \label{e20} 

 y_i =   \beta_i x_i + \epsilon_i \; , 

 \end{equation} 

  \begin{equation} \label{e22} 

 \beta_i = \overline{\beta}  +  v_i \; , 

 \end{equation} 

 and 

 \begin{equation} \label{e21} 

 x_i=  \gamma_1 + \gamma_2 \beta_i + \gamma_3 w_i + u_i\;, 

 \end{equation} 

 
\noindent 

 where it will be assumed that: (i) $\gamma_1 + 

\gamma_2\overline{\beta} + \gamma_3 \sum w_i/N >0,$ (ii) 
$\overline{\beta}>0$, 

(iii) $w$ and $\overline{\beta}$ are nonstochastic, (iv) $\epsilon, 

u$ and $v$ are independent stochastic disturbances with mean zero and 

finite variance, and (v) $v$ has a symmetric distribution. 

 
 Assumptions (i) and (ii) are  normalizations. 

Assumption (i) says that the average value of $x$  is positive. 
Assumption (ii) says that the policy has 

a positive effect on the impact, whether the impact be desirable or 
not. 

Assumptions (iii) and (iv) establish what is exogenous. 

Assumption (v) says that the true coefficients are symmetrically 

distributed around their average of $\overline{\beta}$.\footnote{This 
assumption is used just following equation (\ref{e30}) below.  The 
bias will exist  regardless of whether there is skewness or not, but 
if $E v_i^3 \neq 0$,   analysis of the sign of the bias becomes more 
complicated.} 

 
 The system  of equations (\ref{e20}) to (\ref{e21}) violates the 
assumptions of the  OLS model in two ways, each 

harmless by themselves:  random parameters and stochastic regressors. 

 The simpler system consisting of (\ref{e20}) and (\ref{e22}) has 

random parameters, but OLS is still unbiased as an estimate of the 

expected value of the parameter. The simpler system consisting of 

(\ref{e20}) and (\ref{e21}) (in which case $\beta_i 
=\overline{\beta}$)  has 

stochastic regressors, but OLS is also   unbiased in that system. 
Like binary nerve 

gas, the two problems are harmless individually,  but dangerous in 

combination. 

 
To see that the  OLS estimate of $\overline{\beta}$ is biased, 
combine equations (\ref{e22}) and (\ref{e21}) to obtain 

 \begin{equation} \label{e25} 

 x_i= \gamma_1 +  \gamma_2 \overline{\beta} + \gamma_2 v_i + \gamma_3 

w_i + u_i \; . 

 \end{equation} 

 The critical middle term in the $\widehat{\beta}_{OLS}$ equation, 

(\ref{e9}), which for unbiasedness must equal zero in expectation, is 

  \begin{equation} \label{e26} 

  \frac{\sum x_i^2 v_i}{\sum x_i^2} 

 \end{equation} 

 or, using  (\ref{e25}), 

  \begin{equation} \label{e27} 

   \frac{\sum (\gamma_1 + \gamma_2 \overline{\beta} + \gamma_2 

v_i + \gamma_3 w_i + u_i)^2 v_i}{\sum x_i^2}. 

 \end{equation} 

  The summed quantity in the numerator can be written as 

  \begin{equation} \label{e29} 

   ([\gamma_1  + \gamma_2 \overline{\beta} + \gamma_3 w_i + u_i] + 
\gamma_2 

v_i)^2 v_i \; , 

 \end{equation} 

 which equals 

  \begin{equation} \label{e30} 

   [\gamma_1 +\gamma_2 \overline{\beta} + \gamma_3 w_i + u_i]^2 v_i + 

2[\gamma_1 + \gamma_2 \overline{\beta} + \gamma_3 w_i + u_i]\gamma_2 
v_i^2 + 

\gamma_2^2 v_i^3, 

  \end{equation} 

  the expectation of which equals 

  \begin{equation} \label{e31} 

   2\gamma_2[\gamma_1 + \gamma_2 \overline{\beta} + \gamma_3 w_i] 
\sigma^2_v, 

  \end{equation} 

  since ($E (v^3)=0$ by assumption 

(v), and $u$ and $v$ are independent.
 

   Expression (\ref{e31}) has the same sign as $\gamma_2[\gamma_1 + 

\gamma_2 \overline{\beta} + \gamma_3 w_i]$.  Summed across the $n$ 

observations, this takes the same sign as $\gamma_2$, since the term 

in square brackets is positive by assumption (i). 

 
  The parameter $\gamma_2$ represents how the marginal impact of the 

policy affects the policy level chosen.  If the policy is used more 

where it is more effective, then $\gamma_2 >0$ if $y$ is a desirable 

impact and $\gamma_2 <0$ if $y$ is undesirable. Expression 

(\ref{e31}) takes the same sign as $\gamma_2$, so the conclusion 

would be that $\beta$ is overestimated if $y$ is desirable and 

underestimated if $y$ is undesirable. Whether $\gamma_2$ takes those 

signs is not obvious, however,  and   Section 2.2 is devoted to 
investigating it. 

 
%--------------------------------------------------------------- 

 
 \bigskip 

\noindent 

 {\bf 2.2. The Sign of $\gamma_2$: Is a Policy Used More Where it 

is More Effective?} 

 
Section 2.1 showed that the sign of the bias depends on the sign of 

$\gamma_2$ in equation (\ref{e21}), which is repeated here: 

$$ 

 x_i=  \gamma_1 + \gamma_2 \beta_i + \gamma_3 w_i + u_i\;. 

 $$ 

 What can be said about $\gamma_2$ in general, without knowing the 

particular application? Is the policy used more where it is more 

effective, so that $\gamma_2$ is positive where the impact is 

desirable and negative where it is undesirable? 

 
  Let us use a   general optimization problem to address the 

question.  Consider one time and place $i$ (so we can drop the 

subscript $i$) where the policy $x$ has an impact $\beta_b x$ which 

produces a utility benefit of $B(\beta_b x)$, with $B'>0, B''\leq 0$; 

and an impact $\beta_c x$ which produces a utility cost of $C(\beta_c 

x)$, with $C'>0, C'' \geq 0$ (and either $C''>0$ or $B''>0$, to give 
the problem an interior solution).    Assume the benefit and the cost 
to be 

separable, so the policymaker's problem is 

  \begin{equation} \label{e1000} 

 \stackrel{Max}{x} M(x) =  B(\beta_b x)- C(\beta_c x). 

  \end{equation} 

 
 The first order condition is 

  \begin{equation} \label{e1010} 

 \frac{ \partial M}{\partial x} = \beta_b B' - \beta_c C'=0, 

  \end{equation} 

 and the second order condition is 

  \begin{equation} \label{e1020} 

 \frac{\partial^2M}{\partial x^2} = \beta_b^2 B'' - \beta_c^2 C'' < 
0. 

  \end{equation} 

 
  \noindent 

   The cross-partials are 

  \begin{equation} \label{e1030} 

 \frac{\partial^2M}{\partial x \partial \beta_b} = B' + \beta_b x 

B'' \;\; 

  \end{equation} 

 and 

  \begin{equation} \label{e1040} 

 \frac{ \partial^2M}{\partial x \partial  \beta_c} = -C' - \beta_c x 
C'' <0. 

  \end{equation} 

 
 Because 

  \begin{equation} \label{e1050} 

   \begin{array}{lll}
\frac{ d x}{d  \beta_b} = -  \frac{ \frac{ \partial^2M}{\partial x 
\partial  \beta_b}}{\frac{ \partial^2M}{\partial x^2}   }  & & 

\frac{ d x}{d  \beta_b} = (-) \frac{ (?)    }{(-)}\\
 \end{array}
  \end{equation} 

 and 

  \begin{equation} \label{e1060} 

  \begin{array}{lll}
  \frac{ d x}{d \beta_c} = -  \frac{ \frac{ 

\partial^2M}{\partial x \partial \beta_c}}{ \frac{ 

\partial^2M}{\partial x^2}   }   & & 

\frac{ d x}{d  \beta_c} = (-) \frac{ (-)    }{(-)}\\
  \end{array} 

  \end{equation} 

 we can conclude that $ \frac{ d x}{d \beta_c} 

$ is always negative, but $\frac{ d x}{d  \beta_b} 

$ might be positive.  A less intense value of the policy is chosen 
when the cost parameter is big, but not necessarily when the benefit 
parameter is small.   There are two  implications for the bias of the 
OLS estimates in Section 2.1:\footnote{ It is interesting to note 
that  the result on costs, and sometimes an benefits,  leads to the 
same conclusion as the folk wisdom that estimation problems usually 
lead to coefficients that are too small.} 

 
(a) If $y$ is undesirable, a cost of the policy, then  $\gamma_2 <0$ 
in  equation (\ref{e21}).   A bigger $\beta_c$ leads to 

a smaller $x$. Hence, in the original estimation problem, OLS 

underestimates $\overline{\beta}$ when the impact is undesirable. 

 
(b) If $y$ is desirable, a benefit of the policy, then  $\gamma_2$ 
might be either positive or negative. 

   If $B(\cdot)$ is close to linear, then $B''$ is small, expression 
(\ref{e1030}) is positive, and 

$\gamma_2 >0$: a bigger $\beta_b$ leads to a bigger $x$.  If 

$B(\cdot)$ is heavily concave (i.e., the benefit  $y$ has sharply 

diminishing marginal utility), then $B''$ is large and $\gamma_2 <0$. 
The more intuitive sign is $\gamma_2 >0$, which says that the  policy 
is used more intensively where it is more effective, in which case 
OLS 

overestimates $\overline{\beta}$, the positive marginal impact. It is 
also possible, however, that  the  policy is used more intensively 
where it is less effective (the policymaker may wish to attain  a 
threshold benefit, for example, which requires greater use of the 
policy if it is less effective). 


\bigskip 

 
 It may be helpful to 

think of the policy $x$ as an expenditure, $PQ^d$, and   the impact 

$\beta_b x$ as the quantity demanded, $Q^d$.  Then $\frac{ 

x}{\beta_b x} = \frac{1}{\beta_b}$ is like the price of the good--- 
it is the expenditure divided by the quantity. 

When $P$ falls, $Q^d$ always rises. But for some goods, demand is 

elastic, and when $P$ falls, $PQ^d$ rises. For other goods, demand is 

inelastic, and $PQ$ falls. For goods with elastic demand, $\gamma_2 

>0$, and for goods with inelastic demand, $\gamma_2 <0$.  The 

direction of the bias of OLS thus depends on the elasticity of 

demand for the policy's benefits. 

In the original estimation problem, 

OLS will overestimate $\overline{\beta}$ if demand for the impact is 

elastic, and underestimate it if demand is inelastic. 


 Yet another way to understand this is by realizing that the same 
problem comes up in trying to predict how factor choice  changes with 
technical change. If the cost of labor goes up, one can confidently 
predict that a factory's use of labor will fall.  If the 
effectiveness of labor goes up, one cannot predict whether  the 
factory will use more labor or less. Ordinarily, we think it will use 
more, but that need not be the case. 

 
\bigskip
\noindent 

 {\bf 2.3. A Consistent Estimator for the Observed-Choice Problem} 

 
 The observed-choice problem can be solved by using instrumental 
variables, even though it is not a conventional simultaneity problem. 
Begin 

with the system above: equations (\ref{e20}), (\ref{e22}), and 

(\ref{e21}). Equations (\ref{e20}) and  (\ref{e22})  were combined to 
give  (\ref{e25}), 

 $$ 

 x_i= \gamma_1 +  \gamma_2 \overline{\beta}  +  \gamma_2 v_i + 
\gamma_3 

w_i + u_i \; , 

 $$ 

  which can itself be rewritten as 

 \begin{equation} \label{e31a} 

   x_i= (\gamma_1 + \gamma_3 \overline{w} +  \gamma_2 
\overline{\beta})  + \gamma_2 v_i + \gamma_3 

(w_i - \overline{w}) + u_i \; , 

 \end{equation}
  where $\overline{w}$ is the sample  mean of $w$.
  Using $(w_i - \overline{w})$ 

as an instrument for $x_i$, the instrumental variables estimator is 

 \begin{equation} \label{e32} 

 \widehat{\beta}_{IV} = \frac{\sum (w_i-\overline{w}) y_i}{\sum 
(w_i-\overline{w}) x_i}. 

 \end{equation} 

 Combining equations (\ref{e20}) and  (\ref{e22}) yields $ y_i = 
\overline{\beta} x_i   +  v_i  x_i + \epsilon_i$, which can be 
substituted into  (\ref{e32}) to yield 

   \begin{equation} \label{e34} 

 \begin{array}{ll} 

 plim\; (\widehat{\beta}_{IV}) & = plim \; \left(\frac{\sum 
(w_i-\overline{w}) 

(\overline{\beta} x_i + v_i x_i + \epsilon_i)}{\sum (w_i - 
\overline{w}) x_i} 

\right)\\ 

 & \\ 

 & = \overline{\beta} + plim \left(\frac{\sum (w_i-\overline{w}) v_i 
x_i}{\sum (w_i - \overline{w}) x_i} \right) + 

 plim \left(\frac{\sum (w_i-\overline{w}) \epsilon_i)}{\sum (w_i - 
\overline{w}) x_i}\right).\\ 

 \end{array} 

 \end{equation} 

  Substituting for $x_i$ from equation (\ref{e31a}) gives 

  \begin{equation} \label{e34a} 

 \begin{array}{ll} 

plim\; (\widehat{\beta}_{IV})    & = \overline{\beta} +
  plim \left(\frac{\sum (w_i-\overline{w}) v_i(\gamma_1 + \gamma_3 
\overline{w} +  \gamma_2 \overline{\beta})}{\sum (w_i - \overline{w}) 
x_i} 

\right) + 

 plim \left(\frac{\sum (w_i-\overline{w}) v_i^2 \gamma_2  }{\sum (w_i 
- \overline{w}) x_i} 

\right) + 

 plim \left(\frac{\sum (w_i-\overline{w})^2 v_i \gamma_3 }{\sum (w_i 
- \overline{w}) x_i} 

\right) + \\
 & 

plim \left(\frac{\sum (w_i-\overline{w}) v_i u_i }{\sum (w_i - 
\overline{w}) x_i} 

\right) + 

 plim \left(\frac{\sum (w_i-\overline{w}) \epsilon_i)}{\sum (w_i - 
\overline{w}) x_i}\right).\\ 

  & \\ 

 & = \overline{\beta}. 

 \end{array} 

 \end{equation}
 Thus, a consistent estimator can be obtained for $\overline{\beta}$ 

if an instrument, $(w - \overline{w})$, is available for 
$x$.\footnote{The constant is another suitable instrument for $x$ 
here, since $v$ has mean zero. If a constant is used as an 
instrument, then $w$ itself can be used, instead of $w- 
\overline{w}$. This problem differs from the  standard instrumental 
variables problem, in which  the difficulty  is   that $x$ is 
correlated with the disturbance $\epsilon$,  so, since  $\epsilon$ 
has mean zero, the instrument does not itself need to have mean zero. 
The special difficulty here is   the $wv^2 \gamma_2$ term.  Since $E 
v^2 \neq 0$, the instrument must have mean zero or the set of 
instruments must include a constant.} 

 
 Heteroskedasticity is also a problem, because the error in equation 

(\ref{e24a})  is $ v_i x_i + \epsilon_i$, the   variance of which, 

$x_i^2\sigma^2_v + \sigma^2_\epsilon$, is different for each 

observation. Weighted instrumental variables is appropriate, with 

weights $1/\sqrt{(x_i^2\sigma^2_v + \sigma_\epsilon^2)}$, which 

requires estimates of $\sigma^2_v$ and $\sigma^2_\epsilon$. One 

procedure to generate   estimates of $\sigma^2_v$ and 

$\sigma^2_\epsilon$ is: 

  \begin{enumerate} 

 \item[(a)] 

  Regress $x$ on $w$ and a constant to get fitted values 
$\widehat{x}$. 

 \item[(b)] 

 Regress $y$ on $\widehat{x}$  to get the estimated coefficient 

$\widehat{\beta}$. 

 \item[(c)] 

 Construct estimated errors $e_i = y_i - \widehat{\beta} x_i$. 

 \item[(d)] 

 Regress $e^2$ on $x^2$ and a constant. Let 
$\widehat{\sigma^2_\epsilon}$ 

be the estimate of the constant and $\widehat{\sigma^2_v}$ be the 

estimate of the coefficient. 

 \end{enumerate} 


\bigskip 

\noindent 

 {\bf 2.4. The Garen Technique} 

 
Garen (1984) solves a problem similar to the present one 

without using instrumental variables, though his  procedure is 
equivalent to 

2SLS in some   examples (see Garen [1987]). 

Let us assume that $w$ is not a determinant of $x$, so no instrument 

is available. The system to be estimated is then: 

 \begin{equation} \label{e42a} 

 y_i =  \overline{\beta} x_i + v_i x_i + \epsilon_i \; , 

 \end{equation} 

 and 

 \begin{equation} \label{e42b} 

 x_i= \gamma_1  + \gamma_2 \overline{\beta} + \gamma_2 v_i + u_i \; , 

 \end{equation} 

  Let us also assume that $u \equiv 0$, which will replace 
identification-by-instrument. 

 
 The reason that OLS is biased in equation (\ref{e42a}) is that if 

$y$ is regressed on $x$, the regressor $x$ is correlated with the 

error term $vx$. This can be viewed as an omitted-variable problem, 

and including a consistent estimate of $vx$ as a separate regressor 
would 

eliminate the bias asymptotically. The analyst can estimate $v_i$ by 

$\widehat{v_i} = x_i - \overline{x} = \gamma_2 v_i$. 

 This is biased unless $\gamma=1$, but that is unimportant, since the 

coefficient on $v_i x_i$ in equation (\ref{e42a}) is known to be 

unity and its regression estimate will be ignored anyway.  The 
analyst can 

therefore regress $y$ on $x$ and $\widehat{v}x$ to obtain a 
consistent 

estimate of $\overline{\beta}$. 

 
 This procedure cannot be used when $u$ does not equal zero---that 

is, when the policy is partly determined by factors unobserved by the 

analyst. In that case, $\widehat{v_i} = x_i - \overline{x} = \gamma_2 

v_i+u_i$, which is correlated with $x_i$ because $x_i$ and $u_i$ are 

correlated. Because of the correlation with $x_i$, $\widehat{v_i} 
x_i$ is 

not a consistent estimator even of $\gamma_2 v_i x_i$, and a 

regression of $y$ on $x$ and $\widehat{v_i} x_i$ would not produce a 

consistent estimate of $\overline{\beta}$.  Equation (\ref{e42a}) can 

be rewritten as 

 \begin{equation} \label{e42c} 

 \begin{array}{ll} 

 y_i &= \overline{\beta} x_i + (\gamma_2 v_ix_i + u_ix_i) + 

([1-\gamma_2] v_ix_i - u_ix_i) + \epsilon_i \;\\ 

 & \\ 

  &= \overline{\beta} x_i + \widehat{v_i} x_i + ([1-\gamma_2] v_ix_i 
- 

u_ix_i) + \epsilon_i \;.\\ 

 \end{array} 

 \end{equation} 

   Thus, if $y$ were regressed on $x$ and $\widehat{v_i} x_i$, the 

regressor $x$ would be correlated with $u_i x_i$ in the error term, 

and the estimate of $\overline{\beta}$ would be biased.  The bias 

disappears only if $u \equiv 0$.  Hence the Garen technique, although 

it does not require an instrument for $x$, does require the analyst 

to have precise knowledge of the variables that determine $x$. 

 
 \bigskip 

\begin{center} 

 {\bf 3. Explanation,  Examples, and Prediction} 

 \end{center}


\noindent
 {\bf  3.1 An Intuitive Explanation of the Observed-Choice Problem}
 

 The algebraic development of Section 2 makes it clear that OLS is 
biased when the observed-choice problem is present, but yields very 
little intuition as to why. Diagrams can make it considerably 
clearer, and can show why the sign of the bias is unambiguous when 
the impact is a cost but ambiguous when it is a benefit. 

 
 Each of the diagrams in Figures 1 and 2 shows  two localities, each 
with its own 

relationship between $x$ and $y$.  These relationships, and the 

average of the two, are shown as rays through the origin.  Localities 

1 and 2 have slopes $\beta_1$ and $\beta_2$, and the average has 

slope $\overline{\beta} = (\beta_1+\beta_2)/2$.  Policymakers 1 and 2 

choose points on their respective rays. If they choose $x$ ignoring 

local conditions, $x_1$ and $x_2$ have the same expected value, and 

the expected average of the two observations is on the middle ray. 

This corresponds to OLS being  unbiased. 

 
 In Figure 1,   $y$ is a benefit of $x$.   In Figure 1a, the more 
effective a policy is in a locality, the {\it more} intensely it is 
used. $\gamma_2$ is positive, and a steeper slope makes a 

policymaker choose a higher level of $x$. Indiana, with a 

greater marginal benefit, chooses a higher policy level than 

Michigan, and $x_1> x_2$. If the econometrician draws a line 

through the origin to lie between the two observations and minimize 

the squared deviations, that line will have a slope {\it greater} 

than $\overline{\beta}$. OLS overestimates the marginal benefit. 


 In Figure 1b, the more effective a policy is in a locality, the {\it 
less} intensely it is used. $\gamma_2$ is negative, and a steeper 
slope makes a 

policymaker choose a lower level of $x$. Ohio, with a 

greater marginal benefit, chooses a lower policy level than 

Nevada, and $x_1> x_2$. (Note, however, that $y_1 > y_2$; Ohio still 
ends up with a greater benefit than Nevada.) If the econometrician 
draws a line 

through the origin to lie between the two observations and minimize 

the squared deviations, that line will have a {\it negative} slope, 
contradicting theory.   OLS underestimates the marginal benefit, and 
in fact gives an impossible result.
\epsfysize=5in 

 
\epsffile{/Users/erasmuse/@Papers/Choice/Choice1.eps} 

 
In Figure 2,   $y$ is a cost of $x$, and a steeper 

slope makes a policymaker choose a {\it lower} level of $x$: 

$\gamma_2$ is negative. Iowa, with a greater marginal cost, 

chooses a lower level than Wisconsin: $x_1< x_2$. If the 

econometrician draws a line through the origin to lie between the two 

observations and minimize the squared deviations, that line will have 

a slope {\it less} than $\overline{\beta}$. 

 OLS underestimates the marginal cost. 

 
  \epsfysize=3in 

 
\epsffile{/Users/erasmuse/@Papers/Choice/Choice2.eps} 

 
\noindent
 {\bf  3.2 Other Problems, to be Distinguished from the 
Observed-Choice Problem}
 

 The observed-choice problem is easily confused with other problems 

in estimation such as the mutual-cause problem, simultaneity, and the 

Lucas critique. It    may be useful to distinguish it from them at 
this point. 

 
The {\it mutual cause problem} is present when variables $x$ and $y$ 

do not really have a causal relationship but are both caused by a 

third variable $w$ such that $x=x(w)$ and $y=y(w)$. If richer cities 
have 

better roads and fewer high-school dropouts, the correlation between 

  good roads ($x$) and  fewer dropouts  ($y$) of is positive 

because of  income ($w$). The quality of roads may be a good 

predictor of the dropout rate in equilibrium, but if the quality were 

changed arbitrarily the relationship would disappear.  The result is 

an overestimate of the impact, whether it be a benefit or a cost. 

 
 {\it Simultaneity} is present when not only does $y$ depend on $x$, 

but $x$ depends on $y$: $ y=y(x)$ and $x=x(y)$. Adding hospitals to a 

city reduces mortality, but a city with less mortality needs 

fewer hospitals.  Simultaneity is not special to policy, and the bias 

can be either overestimation or underestimation, depending on the 
relationships between $x$ and 

$y$. 

 
 The {\it Lucas critique} applies  when the relation between $x$ 

and $y$ only lasts until the government tries to take advantage of 

it, because if $x$ changes, so does $\beta$: $\beta = \beta(x)$. 

Aggregate output only rises with the money supply if money supply 

growth is low, so any attempt to increase output by increasing the 

money supply fails. This problem, which is equivalent to nonlinearity 

in the relationship between $x$ and $y$, is special to policy, and it 

can cause either overestimation or underestimation,  depending on how 
$\beta$ changes in response to $x$. 

 
 The observed-choice problem is not the mutual cause problem, because 

$y$ does  indeed depend directly on $x$.  It is not simultaneity, 
because 

$x$ does not depend on $y$.  And it is not the Lucas critique, 

because $\beta$ does not depend on $x$. It is most closely related to 

the ``selection bias'' or ``self-selection'' problems  found in 

binary-choice models. 


 The most obvious form of self-selection occurs 

when some individuals take actions that prevent them from appearing 

in the observed sample, but even if the individuals in the data set 

are chosen randomly and selection {\it per se} is not a problem, the 

values of independent variables that result from individual decisions 

might depend on unobserved heterogeneity (see Mundlak [1961], Heckman 

[1976, 1979], and Lee [1978]). The name ``selection bias'' is, in 
fact, misleading, since the problem exists even if the sample is the 
entire population. 

The observed-choice problem can be considered a form of the selection 
bias problem, because in both problems the level of the policy 
depends on other variables or disturbances in the model and OLS is 
biased. The standard selection  bias model, however, does not involve 
varying coefficients or policy choices based on the effectiveness of 
the policy being analyzed. Example 4 in the next subsection may  help 
to show the similarities and differences between the two 
problems.\footnote{ Garen (1984) has extended the standard 

selection-bias techniques to a context in which observations are 

sampled randomly from the population but the values of the 

regressors depend on the heterogeneity.  He specifies the value of 

the regressors as being chosen from a large set of discrete choices, 

and the bias arises from an error term with a nonzero expectation 

that interacts with individual characteristics and the particular 

choice made by the individual.  The solution he proposes is to 

estimate the nonzero expectation consistently and include it in the 

regression.  The present paper approaches the problem more simply, 

using a regression model in which coefficients vary across 

observations and the policies depend on the coefficients, but a 

version of the Garen technique  was discussed in Section 2.4.}

  \bigskip
 \noindent
 {\bf 3.3 Examples with Discrete Choice, Nonlinearities, and 
Selection Bias.} 

 
     A  selection of verbal examples may help to further reveal the 
intuition behind the observed-choice problem, to extend the 
implications to nonlinear estimation, and to distinguish it from the 
standard selection bias problem. 

 In the following four  examples, the policy takes 

just two levels, adoption or rejection. 

 
{\it Example 1: Hotel tax revenue, a desirable impact.} A state 

either has a low or a high hotel tax, trading off the increase in 

revenue against the harm to the hotel industry.  In 25 states, the 

high hotel tax would raise \$100 in revenue per capita more than the 

low tax, and those states adopt the tax. In the other 25 states, the 

higher tax would so discourage business that the change in tax 

revenue per capita would be \$0.  The analyst notices that the 25 

states with the high tax have \$100 higher revenue per capita, a 

difference that is 

statistically significant.  He therefore advises  all states 

to impose  high taxes, even though, in truth, the added benefit is 
zero. 

He has overestimated the benefit of increasing a policy's intensity.

%\footnote{xxx With just two choices, I think I can predict that 
benefits will %be overestimated.} 

 
{\it Example 2: Welfare mothers, an undesirable 
impact.}\footnote{Section 4 contains an  empirical version of Example 
2.}  Transfer 

payments to unwed mothers can be set at amount 2 or amount 3. In 25 

states, the illegitimacy rate will be 200 or 300, depending on the 

transfer level, and those states set transfers equal to 2 (see 

Table 1). In 25 other states, the illegitimacy rate will be 200 

regardless of the transfer level, and those states set transfers 

equal to 3.  The analyst sees 25 states with transfers of 2 and 

illegitimacy of 200  and 25 with transfers of 3 and illegitimacy of 

200. He concludes that transfers have no effect on illegitimacy, and 

he suggests that the low-transfer states can increase their 

transfers to 3 without any adverse effects. But doing so would in 

fact increase illegitimacy considerably, and the true average effect 
is an increase of 50  (= [25(100) + 25(0)]/50) in illegitimacy going 
from transfers of 2 to 3.  He has underestimated the 

cost of increasing the intensity of a policy. 

 
\begin{center} 

 \begin{tabular}{c|ccc|c} 

   \multicolumn{5}{c}{ TABLE 1}\\
  \multicolumn{5}{c}{  }\\
  \multicolumn{5}{c}{ EXAMPLES  2 and 3}\\ 

   \multicolumn{5}{c}{  }\\
\hline
\hline
  \multicolumn{5}{c}{ } \\ 

 \multicolumn{2}{c}{\underline{HIGH RESPONSE STATE}} & & 

\multicolumn{2}{c}{\underline{LOW RESPONSE STATE}}\\ 

 Transfer & Illegitimacy & &  Transfer & Illegitimacy\\ 

 \hline 

 & &  & \\
  {\bf  2}  & {\bf 200} & &2 & 200\\ 

  3  &  300 & &{\bf 3} & {\bf 200}\\ 

  4  &  600 & &4 & 600\\ 

 & &  & \\
  \hline 

 \end{tabular}\\ 

\end{center} 

   \bigskip 


 {\it Example 3: The potential for bias is especially strong for 
policy intensities 

outside the sample range.} Add another transfer level  to Example 

2: 

amount 4, which would result in illegitimacy of 600.  The 
low-transfer states keep their 

transfers at 2, and the high-transfer states stay at 3.  The naive 

analyst advises that transfer levels can be increased to 4 in every 

state without any effect on illegitimacy. He is wrong; 

illegitimacy will rise everywhere. The value of policy is especially 

overestimated for intensities greater than exist in the sample. 

 
 This last effect is not just the usual hazard of forecasting out of 

the observed sample range. The naive analyst may well admit that his 

predictions for transfers of 4 are outside of his sample range and 

less trustworthy because of possible nonlinearity in the effect of 

transfers on illegitimacy. But he will add that although possible 

nonlinearity reduces the reliability of the prediction, it could 

result with equal likelihood in either an overestimate or an 

underestimate of the effect.  That is wrong. The very reason why the 

transfer level of 4 is not in his sample is that the effect is 

nonlinear in the particular direction unfavorable to the active 

policy. 

 
Nonlinearities outside the observed sample range could lead to either 

overestimation or underestimation. It could be that the policy is 

much {\it more} effective than we estimate in the range {\it lower} 

than we observe. Table 1 and Figure 3 illustrates the problems with 

extrapolation in either direction. Although the data in Figure 3 may 
represent the entire population of policy choices, it is not random; 
it is purposively chosen to be on the middle part of the benefit 
curve. 


 \vspace*{24pt} 

 \epsfysize=3in 

 
\epsffile{/Users/erasmuse/@Papers/Choice/Choice3.eps} 

 
 Example 3  has some similarity to the Lucas Critique problem, 

because the marginal effectiveness of the policy depends on the 

policy level chosen. This dependence, however, would exist even if 

the policy levels were chosen randomly.   What the observed-choice 

problem adds is the idea that  the policies will be chosen so as to 

make the Lucas critique especially applicable.  The Lucas critique 

says that {\it if} the variation in the data is too small, 
nonlinearities in 

the function being estimated are a big problem, where ``too small'' 

depends on the context.   The observed-choice 

problem explains {\it why} the variation  will be too small. 

 
\bigskip 

 
\noindent 

{\it Example 4. Job Training and Selection Bias. } 

    The effect of job training 

programs is the paradigmatic problem for  which  economists have 
worried about selection 

bias (see  Heckman \& Robb [1985], Heckman, Hotz \& Dabos [1987], or 
Lalonde [1986]). Suppose 

 half of  a group of unemployed people  had wages of 100   in their 
previous jobs 

and half had wages of  120.  They are offered training, 

but only the people with past wages of 120 accept the training, for 
some exogenous reason. The 

training makes no difference in productivity for either group. 

Afterwards, however, the trained people get jobs with wages of 120, 

and the untrained get wages of 100. If the naive analyst ignores the 

previous wages, he concludes that  the training raised wages by 20 
percent. 

Just as easily, the problem could have been that only the 100-wage 

people accepted training, in which case the bias would have been 

pessimistic rather than optimistic. In either case, techniques are 

available for correcting the problem.\footnote{An early article on 

this problem is Mundlak (1961), which notes that if good farm 

management, which is unobserved, has a positive additive effect on 
output and 

is correlated with use of some input, then the analyst will 

overestimate the effect of the input on output. For a simple 
exposition of this story, see pp. 204-207 of Varian (1992).  } 

 
The observed-choice problem is different, because it arises out of 

heterogeneous effects of the training rather than heterogeneous 

initial wages.  Suppose that all the unemployed had previous wages of 

100, but half of them would get a benefit of 0 from the training, and 

half would get a benefit of 20. Those that would benefit from the 

training accept it. Afterwards, the trained workers have wages of 120 
and the 

untrained workers have wages of 100. The inference that the training 

raised wages by 20 is correct, but the inference that the average 

effect of training across the entire population is 20 is incorrect; 

it is 10. In the observed-choice 

problem, unlike in the   problem of 

heterogeneous initial wages,  economics provides   prior information 
on the direction of the 

bias.\footnote{More generally,  in a continuous-variable version of 
this story, it could be that the workers with {\it lower} marginal 
benefit decide to get more training, because they need more hours of 
training  to get the same improvement.  Then the estimation  bias 
goes in the opposite direction. But the direction of bias is 
unambiguous in a 0/1 model of training/no training. }


%\footnote{xxx Wildasin 9?) suggested another twist on the seletin 
model: thre %is queueing up for admission to the training program,a 
dn the adminsraors let %in only the people who would benefit the 
most--- or, those who would leave %with the highest salaries.} 

 
%--------------------------------------------------------------- 

 
\bigskip 

\noindent 

 {\bf 3.4 Prediction without Policymaking} 

 
 The most important implication of the observed-choice problem is 

that OLS or the equivalent informal reasoning will lead the analyst 

to be too optimistic in recommending changes in policy because he 

will overestimate benefits and underestimate costs. Making 

predictions for policy recommendations, however,  is  different  from 

making predictions in general, as has long been known.\footnote{See 

Haavelmo (1943), p. 278 of Hurwicz (1950) and p. 56 of Mundlak 

(1961).} Policy recommendations implicitly contain a kind of 

prediction answering the question: ``What will happen to $y_i$ if 
$x_i$ is changed by forces 

outside the model?''  A purer form of prediction asks: ``What will 
happen 

to $y_i$ if $x_i$ changes?''   These are two different things: 
``What will happen after  I change the 

policy'' might be different from ``What will   happen  after the 
policy 

changes?'' 

 
 Recall the mutual-cause example in section 3.2   in which 

high-school dropouts and road quality are inversely correlated across 

cities. An OLS regression would mislead in making the policy 

recommendation that the roads  be improved  to reduce 

the dropout rate. But the OLS regression would correctly predict that 

a city with good roads is likely to have a low dropout rate. 

Likewise, simultaneity is a less dangerous problem for prediction 

than for policymaking. If a city has a large police force, then using 
the 

correlation between police and crime to predict a large amount of 

crime may be correct even though the causal link is that more police 

reduces crime.  If the analyst wants reliable policy implications, he 

needs a theory of causation; if he just wants to predict, he can use 

correlation. 

 
 Prediction given the observed-choice problem is more tortuous.  OLS 

will underestimate the average impact on $y_i$ of a recommended 

increase in $x_i$ if $y$ is an undesirable impact, and 

instrumental variables  estimates that impact correctly. But 

what if $x_i$ takes a large value for reasons internal to the model? 

 
  If the analyst is asked to predict $y_i$ for a new observation $i$ 

that has a policy level of $x_i$, his answer should not be 

$\widehat{y} = \widehat{\beta}_{IV}x_i$, even though 
$\widehat{\beta}_{IV}$ is a 

consistent estimator of $\overline{\beta}$ and the true specification 

is $ y_i = \overline{\beta} x_i + x_i v_i + \epsilon_i$.  A large 

value of $x_i$ is produced by a small value of $\beta_i = 

\overline{\beta} + v_i$ and therefore by a negative value of $v_i$. 

The IV estimator will overpredict $y_i$, because $E(y|x) \neq 

\overline{\beta} x$; instead, $E(y|x)= \overline{\beta} x + E(xv|x)$. 

The bias in prediction is the {\it opposite} of the bias in policy 

recommendation.  But whether the bias for observation $i$ is positive 

or negative depends on the value of $x_i$.  Although the bias is 

downwards when $x$ is large, it is {\it upwards} when $x$ is small. 

When $x_i$ is small, the marginal effect of policy is great, and 

$y_i$ is greater than predicted by the IV estimate. 

  %\footnote{xxx On average will the prediction be correct (not 

%conditioning on x?} 

  One could use Bayes Rule to estimate $E (\beta_i|x_i)= \int \frac{ 

f(x|\beta) f(\beta)}{f(x)}d\beta$, but this requires knowledge of the 

functional form of the distribution of $v$, since $\beta_i 

=\overline{\beta} + v_i$. 

 
\begin{center} 

\begin{tabular}{l|cccc} 

 \multicolumn{5}{c}{  TABLE 2}\\
 \multicolumn{5}{c}{ }\\ 

 \multicolumn{5}{c}{ PREDICTION: HOTEL TAX REDUCTION}\\ 

 \multicolumn{5}{c}{ }\\ 

 \hline 

\hline
 Tax of new & True effect of & True revenue& Naive & Sophisticated\\ 

 state     & a high tax &         & Prediction & Prediction\\ 

\hline 

   &   &   &   &  \\
 High & 100 & 100 & 100 & 50\\ 

 Low & 0 & 0 & 0 & 0\\ 

   &   &   &   &  \\
 \hline 

\end{tabular}\\ 

\bigskip 

\end{center} 

 
 Return to Example 1, the hotel tax. The naive analyst predicts that 

a state with a high hotel tax will have \$100 more in revenue, 

whereas the analyst who corrects for the observed-choice problem 

predicts \$50.  The sophisticated analyst will do better in 

predicting the effect of increasing the tax in a state that currently 

has a low tax; he will predict \$50, the naive analyst will predict 

\$100, and the true increase will be \$0.  For high-tax states, the 

sophisticated analyst predicts a \$50 from lowering the tax, the 

naive analyst \$100, and the truth is \$100, but over both kinds of 

states the sophisticated analyst will have lower mean squared error, 

as well as an unbiased estimate. 

 
  In pure prediction, however, the naive analyst does better. Suppose 

that the problem is to predict the hotel tax revenue in a state 

outside the original sample, knowing only that the state has a high 

hotel tax.  The naive prediction is that the new state's revenue will 

be \$100 higher than in low-tax states, and the ``sophisticated'' 

prediction is \$50. Since the reason the new state imposed a high tax 

was because it would raise revenue there, the true value is \$100, 

and the naive analysis yields the correct answer.  The same would be 

true of a new state with a low hotel tax; the naive prediction that 

its revenue is \$100 below that of states with high taxes is correct, 

and the sophisticated prediction of \$50 is incorrect. 

 
The analyst must decide which kind of question he 

is answering.  Instrumental variables is appropriate for answering 

questions about exogenous changes in policies, but not for answering 

questions about endogenous changes or for out-of-sample predictions. 

 
%--------------------------------------------------------------- 

 
   \begin{center} 

 {\bf 4. An Empirical Example: Illegitimacy and  Aid to Families with 
Dependent Children  } 

 \end{center} 

 
  As an empirical example, let us consider the problem of estimating 

the effect of welfare on illegitimacy. Simple economic rationality 

suggests that if transfer payments are made to individuals contingent 

on their being single mothers, the number of single mothers will 

increase. The  only  question is how much.  A 

survey by Elwood \& Crane (1990) on the state of the black family 

suggests that the answer is very little.  As Table 3 shows, the 
levels of 

transfer payments do not show any clear relation to the percentage of 

black children living with only a single parent, and we have no 

 reason to believe that  black women are less sensitive to monetary 
incentives than white women.\footnote{In fact,   some evidence exists 
that black women are more sensitive to monetary incentives, not less. 
Kneisner, McElroy \& Wilcox (1988a) find that  a greater correlation 
between poverty and illegitimacy for blacks than for whites, 
suggesting that a given  monetary incentive  might be more powerful 
for blacks  simply because it is a larger proportion of total income. 
See also   Kneisner, McElroy
\& Wilcox (1988b), discussed below.    }    Since Aid For Dependent 
Children (AFDC) levels 

vary across states, cross-section estimates have also been made, both 

reduced-form and structural, but Elwood \& Bane tell us that ``In 

general, both methods reveal only weak to moderate effects of 

welfare'' (Elwood \& Bane, 1990, p. 74). A 1990 study by Darity 

\& Myers, for example, finds, using CPS data on individuals in 

different states, that the elasticity of female headship of black 

families with respect to welfare levels was just .075.   This is a 
general finding from time-series and cross-sectional studies; in his 
{\it Journal of Economic Literature } survey,   Moffit (1992, p. 31) 
says, ``The failure to find strong benefit effects is the most 
notable characteristic of this literature.''   At the same time,  one 
longitudinal study, that of  Kneisner, McElroy \& Wilcox (1988b), 
does find a significant effect of monetary incentives on 
illegitimacy: greater AFDC payments increases the  number of    women 
who become single mothers, especially for black women,  although the 
size of the payment does not seem to affect how long they stay on 
AFDC. Thus, the general conclusion seems to be that the AFDC level in 
a state does not much affect  the number of illegitimate births in 
that state, but  apparently at the level of the individual,  the AFDC 
level does affect the decision to become a welfare mother. 


\begin{center} 

\begin{tabular}{l|cccc} 

 \multicolumn{5}{c}{TABLE 3}\\
 \multicolumn{5}{c}{ }\\ 

\multicolumn{5}{c}{ TRANSFER PAYMENTS OVER TIME}\footnote{Table 3 is 
somewhat misleading. Housing and medical benefits are not included, 
and they increased substantially during the 1980's.}\\ 

 \multicolumn{5}{c}{ }\\ 

 \hline 

  \hline 

    & 1960 & 1970 & 1980 & 1988 \\ 

     \hline 

  AFDC and food stamp payment level &  \$7,324 & \$9,900  & \$8,325 
& \$7,741\\ 

  \hspace*{6pt}  (family of 4 with no income-- & & & &\\ 

   \hspace*{6pt}1988 dollars CPI-U adjusted)  & & & &\\ 

 Percent of black children not & 33.0 & 41.5 & 57.8 & 61.4 \\ 

  \hspace*{6pt}living with two parents& & & &\\ 

   Estimated percent of black & 10.4 & 33.6 & 34.9 & 30.1 \\ 

  \hspace*{6pt}  children collecting AFDC& & & &\\ 

 \hline 

\multicolumn{5}{l}{  Source: Table 3 of  Elwood \& Bane (1990).}\\ 

 \hline 

   \end{tabular} 

\bigskip 

\end{center} 

 
 The observed-choice problem applies to this situation and may help 
explain the discrepancy between the aggregate and the individual 
estimates. The observed-choice problem applies if the explanatory 
variable is a policy and the dependent variable is  a cost. 

  Although economists, with their occupational interests, tend to 
think of the disincentive AFDC provides to supplying labor,  in the 
minds of the public,  illegitimacy is viewed as one of the chief 
costs of AFDC, and it is reasonable to suppose that the marginal 
effect of AFDC differs across states for a variety of cultural and 
economic reasons that are difficult to pick up in aggregate 
regressions.   One explanation for the time series evidence is that 
the social breakdown occurring in the 1960s and 1970s   increased the 
marginal impact of AFDC on illegitimacy  for any level of AFDC, 
shifting up the entire curve, so the government reduced the size of 
AFDC payments. Theory cannot predict whether the final effect of an 
increase in  the marginal impact would be an increase or decrease in 
illegitimacy; here, it seems to have increased despite the cuts in 
AFDC.   Similarly, the cross-sectional evidence might be the result 
of states in which AFDC would have a bigger effect on illegitimacy 
choosing lower levels of AFDC.  It might be, for example, that the 
number of women of each age in a state is important to the effect of 
AFDC, and this is difficult to put into a state-by-state regression, 
with its limited degrees of freedom In longitudinal studies, on the 
other hand, more individual variables can be taken into account,  and 
the observed-choice problem is diminished, which might explain the 
greater size and significance of the estimated 
coefficients.\footnote{Longitudinal studies are not immune from  the 
observed-choice problem, but it is less likely to be  severe. 
Suppose  that individual  Vermont women of given race, age, income, 
etc. respond more  to AFDC than do Maine women.  The  Vermont 
legislature will    choose a lower level of AFDC, other things equal, 
and the observed-choice problem is present. The advantage of 
individual data is that the analyst can at least adjust for race, 
age, and income, so  if there exists a  missing variable causing the 
problem, it  must be something special to Vermonters   {\it qua } 
Vermonters, not to Vermonters {\it qua} white,   young, poor people. 
} 

 
 In this section I  will use  state-level cross-sectional data to 
illustrate  how instrumental variables with a heteroskedasticy 
correction   might be used to improve our estimates of the effect of 
transfer payments on illegitimacy  by researchers more familiar with 
welfare policy than myself.\footnote{A more thorough analysis would 
use data on counties or individuals,   assemble price indices for 
each location,  try nonlinear specifications, use more instruments, 
test overidentifying restrictions, test for whether the model should 
be fully simultaneous,  etc.   Most importantly, it would aggregate 
all the benefits of poverty, including medical benefits, housing 
benefits, and illegal income.  In fact,  Orr (1992)  suggests an 
alternative explanation for the small effects of AFDC that have been 
discovered: that overall transfer payments show much less variance 
across states than do AFDC payments.    Since the  objective here is 
just to  illustrate the observed-choice problem, such refinements are 
ignored.  Certain of the  model's simplifications   would generally 
tend towards obtaining insignificant results and a small coefficient 
for AFDC. Adjusting for local prices, for example, would increase the 
real AFDC levels in the Southern states, which   have high 
illegitimacy rates.   Also,    Nelson \& Startz (1990) find  that 
when one variable is being instrumented 

 using one instrument, the IV estimator has a central tendency in 
small samples that is biased in the direction of the OLS 
estimator---towards too small a coefficient, here. Nonetheless, given 
the small  number of observations, the main substantive  contribution 
of this analysis is simply  to cast doubt on existing estimates that 
ignore the observed-choice problem.} 

 Table 4 shows the complete  dataset. 

  The  1989 {\it  Annual Statistical Supplement} of the {\it Social 
Security Bulletin} provides data on average monthly payments per 
recipient from the  Aid to Families with Dependent Children (AFDC) 
program  for each state plus the District of Columbia (a sample size 
of 51).\footnote{\label{f8}``AFDC'' is ``Aid to Families with 
Dependent Children, Amount of Payments, Monthly  average per 
Recipient,'' for 1987, p. 342,  p. xiii, 1990 {\it   Statistical 
Abstract of the United States.}}       This varies from state to 
state because the federal government does not pay for the entire 
amount, and gives states some flexibility in eligibility 
requirements, or even in whether they wish to participate at 
all.\footnote{ For details of the state and federal responsibilities 
in funding and eligibility criteria, see the {\it 1993 Green Book}, 
the annual report  on entitlement programs of   the Committee on Ways 
and Means, U.S. House of Representatives, which   contains additional 
data on maximum possible benefits per family, state shares of the 
payments, payments over time, etc. }  The 1990 {\it   Statistical 
Abstract of the United States} provides data on the illegitimacy 
rate, as well as on     the average disposable personal income per 
capita in the state, the percentage of urbanization, and the 
percentage of the population that is black. It will be assumed that 
these are the  relevant exogenous variables.\footnote{\label{f9} 

    ``Illegitimacy'' is ``1987 births to unmarried women, percent,'' 
p. xiii, 1990 {\it   Statistical Abstract of the United States}. 
``Income'' is ``Disposable personal income per capita, 1988,'' p. 
xviii.  ``Urbanization'' is ``Resident population in metro areas, 
1988, percent,'' p. xii. ``Dukakis vote'' is calculated from ``1988 
percent for leading party,'' p. 246. ``South'' takes the value of 1 
if the state is southern under the {\it Statistical Abstract's} 
definition, and 0 otherwise. ``Black'' is the 1990 percentage, p. 26. 
Estimation was done using the matrix operations in {\it Mathematica} 
(Champaign, Illinois: Wolfram Research).} 


     A simple regression of illegitimacy on AFDC and a constant 
yields the following relationship: 

     \begin{equation} \label{e100} 

  \begin{array}{lll } 

  Illegitimacy &= 26.91  &{\bf -0.034* AFDC},  \\ 

    &  (3.05) & {\bf (0.026)  } 

      \end{array} 

 \end{equation} 

(standard errors in parentheses) with  $R^2=.03$.    Equation 
(\ref{e100}) implies  that high AFDC payments reduce the illegitimacy 
rate, but this is, of course, 

  misleading because the simple regression leaves out important 
variables. Regression (\ref{e101}) more appropriately  controls for a 
variety of things which might affect the illegitimacy rate: 

 \begin{equation} \label{e101} 

  \begin{array}{lll ll} 

  Illegitimacy &= 15.74  &+ {\bf 0.016* AFDC} & -0.00011* Income 
&+0.024* Urbanization \\ 

  &  (3.65) & {\bf (0.021)} & (0.00042) & (0.033)\\ 

   & &&&\\ 

    & - 1.60* South & + 0.56*Black, & &\\ 

     & (1.71) & (0.06) & &\\ 

   \end{array} 

 \end{equation} 

  with $R^2=0.79$. Equation (\ref{e101}) would leave us with the 
conclusion that AFDC payments have almost no effect on  the 
illegitimacy rate. Nor, surprisingly,  do any of the other variables 
except   race  have large or significant coefficients. The 
coefficients are small enough, in fact, that 

one might doubt whether increasing the size of the dataset would 
change the conclusions: the variables are insignificant not because 
of large standard errors, but because of small coefficients. 

 
  If the theory of 

this paper is correct,  the problem  with equation (\ref{e101}) is 
not lack of data, but that the coefficient on AFDC, $\beta_{AFDC}$, 
is properly a cause of the level of AFDC.  For purposes of 
estimation, some identifying instrument is needed to replace AFDC, 
although  fortunately  a complete model of political decisionmaking 
is not required. The instrument used here is the percentage of the 
state's vote in the 1988 presidential election that went to Michael 
Dukakis, which  is correlated with a state's liberalism and hence 
with its tendency to prefer higher levels of AFDC.  This is a 
suitable instrument if  (i) liberals tend to value the net benefits 
of AFDC more highly than do conservatives,  (ii) the presence of 
Dukakis voters, conditioning on the other variables in the model,  is 
not a direct cause of illegitimacy, and (iii) the presence of Dukakis 
voters is not a direct result of the current rate of illegitimacy. 

 The decisionmaking model does need to be  separable in 
$\beta_{AFDC}$ and the instrument: 

 \begin{equation} \label{e103} 

AFDC = \gamma_1 f(\beta_{AFDC}) + \gamma_2 g(Dukakis \; vote)   + u. 

  \end{equation} 

  Equation (\ref{e103}) is the  equivalent of  equation (\ref{e21}) 
in the theoretical part of the article.   Even if the functions $f$ 
and $g$ were known, equation (\ref{e103}) could not be estimated, 
since $\beta_{AFDC}$ is unknown. But equation (\ref{e103}) does not 
have to be estimated to use instrumental variables. Instead, the 
other exogenous variables in (\ref{e102}) plus the vote for Dukakis 
can be used as instruments for $AFDC$. 

  If  $Z$ is the 51-by-6 matrix 

 $$ 

   Z= (Constant, Dukakis \;Vote, Income, Urbanization, South, Black), 

  $$ 

   and 

   $$ 

   X= (Constant,    AFDC, Income, Urbanization, South, Black), 

  $$ 

  then using the instrumental variables estimator $(Z'X)^{-1}Z'y$, 
the estimates become 

    \begin{equation} \label{e102} 

  \begin{array}{lll ll} 

  Illegitimacy &= 18.43& + {\bf 0.19 * AFDC} & - 0.0023* Income & + 
0.091*Urbanization \\ 

           &   (6.03)& {\bf (0.096) }        & (0.0013)   &(0.064)\\ 

	   & & &  & \\ 

	     &   + 3.32* South & + 0.65*Black.& &\\ 

    & (3.72)       & (.11)   & &\\ 

    \end{array} 

 \end{equation} 

 In regression (\ref{e102}), the signs on the variables match 
intuition and theory. AFDC causes more illegitimacy, and higher 
incomes reduce it. Not all variables are  statistically significant, 
but the standard errors are at least smaller than the coefficients. 
{}From this regression, one might hope  that a larger sample size would 
bring all the variables into significance.\footnote{The biggest 
outlier for three variables--- the illegitimacy rate, percentage of 
blacks, and vote for Dukakis--- is   the District of Columbia. When 
D.C. is  excluded, the coefficient on AFDC in equation  (\ref{e102}) 
is .18 (with standard error .11) instead of .096.} 

 
    The theory of this paper instructs us to take an additional step: 
heteroskedasticity is still present, so weighted least squares should 
be used. Following the procedure suggested in Section 2.3 generates 
the following equation, where $\widehat{s^2}$ is the variance of the 
residuals from regression (\ref{e104}):\footnote{The standard errors 
are not presented here to test whether the regression coefficients 
are different from zero.  The theory says that $\sigma_\epsilon^2$ 
and $\sigma^2_v$ are positive; the only question is how to best 
estimate the  magnitudes.} 

\begin{equation} \label{e104a} 

   \begin{array}{lll } 

  \widehat{s^2} = &20.43& + .00065*AFDC^2 \\ 

  &  (11.27) &(.00065)\\ 

     \end{array} 

 \end{equation} 

 From  equation (\ref{e104a}), $\widehat{\sigma^2_\epsilon} = 20.43$ 
and $\widehat{\sigma^2_v} =.00065$. This suggests that if the 
$\beta_{AFDC}$ coefficients are normally distributed, about 
two-thirds of the states' coefficients lie within an interval of 
length $.51 (=2\cdot \sqrt{.00065}$).\footnote{The fact that the 
rounded standard error equals the rounded coefficient is coincidence. 
Since the estimated average coefficient is on the order of .21, the 
size of the interval implies that assuming normality and  ignoring 
our prior beliefs that the coefficient on $AFDC^2$ is positive  is 
probably a mistake.  The need to incorporate prior information is 
even clearer if the regression is run without the outliers of DC and 
Utah, in which case the estimated coefficient on $AFDC^2$ is 
negative:  $-.00011$, with a     standard error of .00030.} 

 
 Having estimated an equation determining the size of the error for 
an individual observation, it is now possible to use weighted least 
squares or GLS  to re-estimate the main equation.  If we define 
$\Omega$ to be a diagonal matrix with $20.43 + .00065*AFDC_i^2$ on 
diagonal $i$, then the GLS-IV estimator is $\widehat{\beta} = 
(X'Z(Z'\Omega Z)^{-1} Z'X)^{-1}X'Z(Z'\Omega Z)^{-1}y$, with standard 
errors being the square roots of the diagonals of  $[(y-X 
\widehat{\beta})'\Omega^{-1}(y-X 
\widehat{\beta})/(51-6)][X'Z(Z'\Omega Z)^{-1} Z'X)^{-1}$,   which 
yields 

 \begin{equation} \label{e105} 

   \begin{array}{lll ll} 

  Illegitimacy = &18.62&{\bf  + 0.21* AFDC}& -0.0024* Income & 
+0.094*Urbanization\\ 

  &  (6.44) &{\bf (0.10)} & (0.0014) & (0.070)\\ 

   & & & & \\ 

    &   + 3.19* South & + 0.64*Black,& &\\ 

    & (3.68) & (0.11)& &\\ 

     \end{array} 

 \end{equation} 

 The estimates stay roughly the same as with unweighted instrumental 
variables (though note that AFDC does pass the boundary of 
significance at the 5 percent level, since 
$t(45)=2.014$).\footnote{The small size of the heteroskedasticity may 
make one wonder whether the observed choice problem is the true 
problem in this example.   The observed choice problem  grows worse 
with increasing heterogeneity in the $\beta_i$, and so does the 
amount of heteroskedasticity. It is not follow, however, that the 
observed-choice problem is trivial if heteroskedasticity is   small, 
because the observed-choice problem also depends on how the states 
react to the differences in the $\beta_i$. If $\beta_i$ is almost the 
same in every state, but states react very strongly to $\beta_i$ in 
choosing their levels of $x_i$, then the observed-choice problem can 
still be severe.}
 

 Equation (\ref{e105}) says that 

if the average monthly AFDC payment in a randomly chosen state 

rises by 10 dollars, our best estimate of the increase in the 
illegitimacy rate is 2.1 percent. For the average state, 

 this would be  an increase in the AFDC payment of 8.1 percent 
producing an 8.6 percent increase in the illegitimacy percentage, an 
elasticity of 1.06. The coefficient on AFDC is  both economically and 
statistically significant.\footnote{Do recall the caveat earlier: 
this analysis ignores other welfare benefits such as food stamps, 
medicaid, and housing subsidies. If they are correlated state by 
state with AFDC, then what looks like the impact of a 10-dollar, 8.6 
percent increase in AFDC is actually the effect of a 
more-than-ten-dollars, 8.6 percent increase in total welfare 
benefits. Thus, increasing AFDC by itself might not have such a large 
impact, though welfare policy as a whole would. If, on the other 
hand, AFDC and other benefits are negatively correlated, the method 
here underestimates the effect of additional welfare income.  } 
Regarding the other variables: if the state's per-capita income rises 
1000 dollars, illegitimacy falls 2.4 percent;  if urbanization 
increases 10 percent, illegitimacy rises .94 percent; if the state is 
in the South, the illegitimacy rate is 3.19 percent  higher, and if 
the black percentage rises 10 percent, the illegitimacy rate rises 
6.4 percent. 

 
  Notice the contrast with the initial multiple regression using OLS, 
equation (\ref{e101}).  The sign has changed on $South$,  all 
coefficients except the constant and $Black$ have at least doubled, 
those  on  $Urbanization$ and $South$ have more than doubled, and 
those  on $AFDC$ and $Income$ are more than ten times their initial 
size. The estimated elasticity of illegitimacy with respect to AFDC 
for the average state rises from .08 to 1.06.   Adjusting for the 
observed-choice problem clearly has a large effect. 

 
 As a final, perhaps tangential, point, the 

 regression of $AFDC$ on the other  variables may  be of interest. 
Since the theory assumes that $\beta_{AFDC}$ is one of the variables 
that explains $AFDC$, and  since illegitimacy is endogenous, this 
regression is misspecified and biased, but it gives some idea of the 
important correlations: 

  \begin{equation} \label{e104} 

  \begin{array}{lll ll} 

  AFDC &= -63.08& -0.50*Illegitimacy  & +0.012* Income & 
-0.37*Urbanization \\ 

           &   (32.40)& (1.13)         & (0.002)   &(0.22)\\ 

	   & & &  & \\ 

   &  -19.21 * South & -0.71*Black,& + {\bf 1.51(Dukakis\;Vote)}&\\ 

    & (11.37)       & (0.69)   & {\bf (0.63)} &\\ 

    \end{array} 

 \end{equation} 

  with $R^2=0.69$.  The  high $R^2$  indicates that the fit of the 
instrumental variables is quite good (and it only  falls to   .68 if 
$Illegitimacy$ is dropped).  The coefficient on $Dukakis\;Vote$ is 
large and  positive with a small standard error, indicating that 
there is a strong correlation between AFDC and the vote for Dukakis 
even after conditioning on the other variables. 

 The negative sign and high standard error on $Illegitimacy$ gives 

some comfort that the instrumental variables regression results are 

not due to instrumenting for an endogenous AFDC, instead of  through 
a negative correlation with $\beta_{AFDC}$. One might 

wonder whether the political strength of parents would cause a 

positive link between illegitimacy and AFDC, but, in fact, the 

conditional correlation is negative. If it had been positive, and the 

most important effect is that illegitimacy causes AFDC, then the 

instrumental variables estimator here would still be consistent, but 
one 

would expect that instrumental variables would produce {\it smaller} 

coefficients than OLS, not larger, because under OLS some of the 

apparent effect of AFDC on illegitimacy would really be due to the 

positive correlation between the political power of the parents and 

AFDC.

\pagebreak 

 
  \vspace*{-1in} 

 \thispagestyle{empty} 

 \begin{footnotesize} 

 \begin{tabular}{|l|lll lll r|} 

 \hline 

  \hline 

  State & Illegitimacy & AFDC & Income & Urban- &  Black & Dukakis 
&Unexplained Illeg.\\ 

   &    &          &        &      ization  & & vote & (from 
(\ref{e105})) \\ 

   & (\%) & (\$/month) & (\$/year) & (\%) & (\%) & (\%) & (\%)\\ 

  \hline 

     Maine    &      19.8    &      125    &      12,955    & 
36.1    &       0.3 

    &      44.7    &       2.8      \\ 

    New       Hampshire    &      14.7    &      140    &      17,049 
&      56.3    &      0.6 

    &      37.6    &       2.3      \\ 

    Vermont    &      18.0       &      159    &      12,941    & 
23.2    &      0.4    &      48.9 

    &       -4.9      \\ 

     Massachusetts    &      20.9    &      187    &      17,456    & 

    90.6    &      4.8    &      53.2    &       -6.2      \\ 

    Rhode       Island    &      21.8    &      156    &      14,636 
&      92.6    &      3.8    & 

    55.6    &       -5.2      \\         Connecticut    &      23.5 
&      166    &   \framebox{19,096 } 

    &      92.6    &      8.2    &      48.0       &       2.3 
\\ 

     \hline 

    New       York    &      29.7    &      166    &      16,036    & 
91.2    &      16.1    & 

    51.6    &       -3.8      \\ 

   New       Jersey    &      23.5    &      119    &      18,615 

    &    \framebox{100}       &      14.4    &      43.8    & 
6.2      \\ 

    Pennsylvania    &      25.3    &      111    &      14,072    & 
84.8    &      9.4    & 

    50.7    &       3.4      \\ 

     \hline 

             Ohio    &      24.9    &      102    &      13,326    & 

    78.9    &      11.0       &      45.0       &       2.6      \\ 

    Indiana    &      22.0       &      84    &      12,834    & 
68.1    &      8.4    &      40.2 

    &       4.9      \\         Illinois    &      28.1    &      101 
&      15,150    &      82.5 

    &      16.1    &      49.3    &       6.7      \\ 

    Michigan    &      20.4    &      156    &      14,094    & 
79.9    &      14.6    & 

    46.4    &      \framebox{-14.0 }        \\         Wisconsin    & 
20.7    &      160    &      13,296 

    &      66.5    &      4.8    &      51.4    &       -8.5      \\ 

     \hline 

    Minnesota    &      17.1    &      171    &      14,037    & 
66.6    &      1.6    & 

    52.9    &      \framebox{-11.0 }        \\ 

        Iowa    &      16.2    &      124    &      12,475    & 

    43.4    &      1.9    &      54.7    &       -3.5      \\ 

    Missouri    &      23.7    &      87    &      13,340    & 
66.0       &      10.8    & 

    48.2    &       5.9      \\         North       Dakota    & 
13.9    &      125    &      11,388 

    &      38.4    &      0.5    &      44.0       &       -7.2 
\\ 

  South  Dakota    &      19.4    &      94    &      11,611    & 
29.1    &       0.3    & 

    47.2    &       6.2      \\ 

       Nebraska    &      16.8    &      108    &      12,773    & 

    47.6    &      3.4    &      39.8    &       -0.2      \\ 

    Kansas    &      17.2    &      110    &      13,235    & 
53.4    &      5.8    &      44.2 

    &       -1.2      \\ 

     \hline 

            Delaware    &      27.7    &      99    &      14,654 
&      65.9 

    &      18.9    &      44.1    &       2.1      \\ 

    Maryland    &      31.5    &      115    &      16,397    & 
92.9    &      26.1    & 

    48.9    &       -0.4      \\         DC    &     \framebox{59.7 } 
&      124    &      17,464    & 

   \framebox{100}    &  \framebox{68.6}   &      \framebox{82.6}    & 
0.5      \\ 

    Virginia    &      22.8    &      97    &      15,050    & 
72.2    &      19.0       & 

    40.3    &       -2.1      \\         West       Virginia    & 
21.1    &      80    &      10,306 

    &      36.5    &      2.9    &      52.2    &       2.1      \\ 

    North       Carolina    &      24.9    &      92    &      12,259 
&      55.4    &      22.1 

    &      42.0       &       -6.0         \\ 

    South       Carolina    &      29.0       &      66    & 
11,102    &      60.5    &      30.1 

    &      38.5    &       -5.0         \\ 

    Georgia    &      28.0       &      83    &      12,886    & 
64.8    &      26.9    &      40.2 

    &       -3.5      \\         Florida    &      27.5    &      84 
&      14,338    &      90.8 

    &      14.2    &      39.1    &       5.0         \\ 

     \hline 

    Kentucky    &      20.7    &      72    &      11,081    & 
46.1    &      7.5    & 

    44.5    &       1.4      \\         Tennessee    &      26.3    & 
54    &      12,212    & 

    67.1    &      16.3    &      42.1    &       5.7      \\ 

    Alabama    &      26.8    &      \framebox{39}    &      11,040 
&      67.5    &      25.6    & 

    40.8    &       0.5      \\         Mississippi    &      35.1 
&      \framebox{39}    &      \framebox{9612} 

    &      30.5    &      35.6    &      40.1    &       2.4      \\ 

     \hline 

    Arkansas    &      24.6    &      63    &      10,670    & 
39.7    &      15.9    & 

    43.6    &       1.3      \\         Louisiana    &      31.9    & 
55    &      10,890    & 

    69.2    &      30.6    &      45.7    &       -1.4      \\ 

    Oklahoma    &      20.7    &      96    &      10,875    & 
58.8    &      6.8    & 

    42.1    &       -4.8      \\         Texas    &      19.0       & 
56    &      12,777    & 

    81.3    &      11.9    &      44.0       &       0.9      \\ 

     \hline 

    Montana    &      19.4    &      120    &      11,264    & 
24.2    &      \framebox{0.2}   & 

    47.9    &       0.5      \\         Idaho    &      13.0       & 
95    &      11,190    & 

    \framebox{20.0}       &      0.4    &      37.9    &       -0.6 
\\ 

    Wyoming    &      15.8    &      117    &      11,667    & 
29.2    &      0.8    & 

    39.5    &       -2.3      \\ 

    Colorado    &      18.9    &      109    &      14,110 

    &      81.7    &      3.9    &      46.9    &       1.3      \\ 

    New       Mexico    &      29.6    &      82    &      10,752 
&      48.9    &      1.7    & 

    48.1    &       \framebox{14.0}         \\         Arizona    & 
27.2    &      92    &      13,017    & 

    76.4    &      2.7    &      40.0       &       \framebox{12.0} 
\\ 

    Utah   &      \framebox{11.1}    &      116    &      10,564    & 
77.4    &      0.7    &      \framebox{33.8} 

    &       \framebox{-14.0}         \\         Nevada    &      16.4 
&      86    &      14,799    &      82.6 

    &      6.9    &      41.1    &       3.2      \\ 

     \hline 

    Washington    &      20.8    &      157    &      14,508    & 
81.6    &      2.4    & 

    50.0       &       -4.8      \\         Oregon    &      22.4 
&      123    &      12,776    & 

    67.7    &      1.6    &      51.3    &       1.5      \\ 

    California    &      27.2    &      191    &      16,035    & 
95.7    &      8.2    & 

    48.9    &       -6.8      \\         Alaska   &      22.0       & 
\framebox{226}    &      16,357    & 

    41.7    &      3.4    &      40.4    &       \framebox{-10.0} 
\\ 

    Hawaii    &      21.3    &      134    &      14,374    & 
76.3    &      1.8    &      54.3 

    &       1.1      \\ 

             \hline 

   United States    & 24.5 & 124 &   14107 & 77.1 &  12.4 & 46.6 & 
0\\ 

     \hline 

      \hline 

\multicolumn{8}{c}{    }\\ 

\multicolumn{8}{c}{    }\\ 

\multicolumn{8}{c}{\bf Table 4: The Data and the Regression 
Residuals}\\ 

 \multicolumn{8}{c}{(Extreme values are boxed. Sources and 
definitions are in footnotes  \ref{f8} and \ref{f9}.)}\\ 

       \end{tabular} 

       \end{footnotesize} 

 %--------------------------------------------------------------- 

\bigskip 

\begin{center} 

 {\bf 5. Concluding Remarks} 

 \end{center} 

 
 When the independent variable in an econometric problem is the 

result of a policy decision and the dependent variable is a cost or 

benefit of that decision, the OLS estimate will have a tendency to 

overestimate the net benefit of the policy.  This will happen if the 

decisionmakers are rational, even if the dependent variable is not 

their main concern, and the coefficients vary across observations, 

two conditions which are harmless separately but dangerous when 

present in combination. 

 
 The observed-choice problem applies to a variety of policies. 

Whether the analyst wishes to estimate the effects of unemployment 

insurance, transfer payments, police protection, or speed limits, he 

should worry about the source of the variation in policies across 

space and time. If the variation arises from factors unrelated to the 

main effect being analyzed, OLS is unbiased, but if it arises from 

differences in the marginal cost or benefit of the policy, bias is 

introduced.  If every decisionmaker is optimizing, then in 

equilibrium there is no net benefit from changing any policy, but an 

outside observer, seeing differences in policies correlated with 

differences in total benefits, may be fooled into thinking that 

change would help. 

 
 Even if the variation in policies does not arise from differences in 

the coefficient, there may still be an observed-choice problem for 

any extrapolation beyond the observed data.  If the coefficient 

changes with the level of policy---that is, if the policy has a 

nonlinear effect---then policymakers will avoid policy ranges for 

which the marginal costs are high or the marginal benefits low. The 

absence of a policy from the data provides information about its 

effect. 

 
 The observed-choice problem provides a reason why social 

experiments are useful. In one experiment, described by Woodbury \& 
Spiegelman 

(1987), unemployed people in Illinois were selected randomly and 

offered a \$500 bonus if they accepted a job within 11 weeks and held 

it for at least 4 months. The most obvious reason for such an 

experiment is that existing variation in policies was insufficient: 

no state offered such a policy, so its effect could not be measured. 

A second reason is that the experiment controlled for state-specific 

effects.  A third reason is the observed-choice problem: if 

Illinois adopted such bonuses as a general policy,  instead of being 

chosen for an experiment, one might conclude that Illinois adopted 

the policy because it was especially effective there.  Experiments 

that assign policies randomly eliminate this problem. They are, on 

the hand, costly and full of practical difficulties, as Heckman et 

al. (1987) point out, so the clever econometrician may, 

in the end, still be more cost-effective than the clever 
experimentalist. 

 
When policies differ, one should ask why. For the economist, as for 
the 

Freudian, nothing happens by accident. 

If policies depend on their potential impacts, 

then naive estimates of those impacts are biased. This  will 
ordinarily be the case, since 

costs and benefits, not random whims,  are the motivations behind 

policy. Therefore, not only must one construct a model of how $x$ 
determines $y$; one  must think about 

whether $\beta_i$ determines $x_i$.  If it does, then the 

uncorrected estimates should only be used as    upper bounds 

on policy effectiveness,  or  instrumental variables should be used 
to correct the estimates.    This can make an important  difference 
in problems such as estimating the effect of AFDC on illegitimacy. 

 
%--------------------------------------------------------------- 

 
\newpage 

\bigskip 

\begin{center} 

 {\bf References} 

 \end{center}

Committee on Ways and Means, U.S. House of Representatives (1993) 
{\it Overview of Entitlement Programs: 1993 Green Book}. Washington: 
U.S. Government Printing Office, 1993. 

 
Darity, William and Samuel Myers (1990) ``Impacts of Violent Crime on 

Black Family Structure,'' {\it Contemporary Policy Issues} (October 

1990), 8: 15-29. 

 
 Department of Commerce (1990) {\it Statistical Abstract of the 
United States},  Washington: Superintendent of Documents, U.S. 
Government Printing Office. 

 
 Ellwood, David \& Jonathan Crane (1990) ``Family Change Among Black 
Americans: What Do We Know?'' {\it Journal of Economic Perspectives}, 
4: 65-84 (Fall 1990). 

 
 Garen, John (1984) ``The Returns to Schooling: A Selectivity Bias 

Approach with a Continuous Choice Variable,'' {\it Econometrica}, 52 

(September 1984) pp. 1199-1218. 

 
 Garen, John (1987) ``Relationships among Estimators of Triangular 

Econometric Models,'' {\it Economics Letters},  25, 39-41. 

 
 Haavelmo, Trygve (1943) ``The Statistical Implications of a System 

of Simultaneous Equations,'' {\it Econometrica}, 2:1-12. January 

1943. 

 
Heckman, James (1976) ``The Common Structure of Statistical Models of 

Truncation, Sample Selection and Limited Dependent Variables and a 

Simple Estimator for Such Models,'' {\it Annals of Economic and 

Social Measurement}, 5: 475-492. 

 
  Heckman, James (1979) ``Sample Selection Bias as a Specification 
Error,'' {\it Econometrica}, 47: 153-161, January 1979. 

 
 Heckman, James, V. Joseph Hotz \& Marcelo Dabos (1987), ``Do We Need 

Experimental Data to Evaluate the Impact of Manpower Training on 

Earnings,'' {\it Evaluation Review}, 11: 395-427. 


 Heckman, James,  \&   Richard Robb (1985), ``Alternative Methods for 
Evaluating the Impact of Interventions: An Overview,'' {\it  Journal 
of Econometrics}, 30: 239-267. 


 Hurwicz, Leonid (1950) ``Prediction and Least Squares,'' in Tjalling 

Koopmans, ed., {\it Statistical Inference in Dynamic Economic 

Models}, New York: John Wiley and Sons, 1950. 

 
 Kennedy, Peter (1985) {\it A Guide to Econometrics}, Second Edition, 

Oxford: Basil Blackwell Ltd, 1985. 


Kneisner, Thomas, Marjorie McElroy, and Steven Wilcox (1988a) 
``Getting into Poverty Without a Husband, and Getting Out, With or 
Without,'' {\it AEA Papers and Proceedings}, May 1988, 78: 86-90. 


Kneisner, Thomas, Marjorie McElroy, and Steven Wilcox (1988b) 
``Individuals and Families in Transition: Understanding Change 
Through Longitudinal Data,'' Papers Presented at the Social Science 
Research Council in Annapolis, Maryland, March 16-18, 1988, U.S. 
Dept. of Commerce, Bureau of the Census. 


Lalonde, Robert (1986) ``Evaluating the Econometric Evaluations of 

Training Programs with Experimental Data,'' {\it American Economic 

Review}, 76: 604-620. 

 
Lee, Lung-Fei (1978) ``Unionism and Wage Rates: A Simultaneous 

Equation Model  with Qualitative and Limited Dependent Variables,'' 

{\it International Economic Review}, 19: 415-433. 

 
 Lucas, Robert E. (1976) ``Econometric Policy Evaluation: A 

Critique,'' {\it Journal of Monetary Economics}, 1976 Special 

Supplement on the Phillips Curve, pp. 19-46. 

 
Maddala, G. (1977) {\it Econometrics}, New York: McGraw-Hill, Inc., 

1977. 


Moffit, Robert, (1992) ``Incentive Effects of the U.S. Welfare 
System: a Review,'' {\it Journal of Economic Literature}, March 1992, 
30: 1-61. 


Mundlak, Y. (1961) ``Empirical Production Functions Free of 

Management Bias,'' {\it Journal of Farm Economics} 443 (February 

1961), pp. 44-56. 

 
Nelson, Charles and  Richard Startz (1990), ``Some Further Results on 
the Exact Small Sample Properties of the Instrumental Variables 
Estimator,'' {\it Econometrica}, 58: 967-876, July 1990. 


Orr, Lloyd (1992) ``Cross-Section Multiple Program Variance in 
Welfare Benefits,'' working paper, Indiana University Department of 
Economics, June 1992. 

 
Peltzman, Sam (1976) ``Toward a More General Theory of Regulation,'' 

{\it Journal of Law and Economics} 19 (August 1976), pp. 211-40. 


Social Security Administration (1989) {\it Annual Statistical 
Supplement} to the {\it Social Security Bulletin}, Washington: 
Superintendent of Documents, U.S. Government Printing Office. 


 Varian, Hal (1992) {\it Microeconomic Analysis, Third Edition}.  New 
York: W.W. Norton \& Company. 

 
 Woodbury, Stephen \& Robert Spiegelman (1987) ``Bonuses to Workers 

and Employers to Reduce Unemployment: Randomized Trials in 

Illinois,'' {\it American Economic Review}, September 1987, pp. 

513-530. 

 
%--------------------------------------------------------------- 

 
  \end{document}