%Paper: ewp-game/9503001
%From: m.agastya@ucl.ac.uk (Murali Agastya)
%Date: Thu, 09 Mar 1995 21:04:29 +0000

\documentstyle[12pt,emlines,bezier,a4]{article}
\input tcilatex 
\newtheorem{definition}{Definition} 
\newtheorem{corollary}{Corollary}[section]
\newtheorem{assumption}{Assumption}
\newtheorem{theorem}{Theorem}
\newtheorem{proposition}{Proposition}
\newtheorem{example}{Example}
\newtheorem{remark}{Remark}
\newtheorem{lemma}{Lemma}[section]
\newtheorem{fact}{Fact}
\parskip=.06in 

 \begin{document} 
\baselineskip=18pt
\begin{titlepage} \title{An Evolutionary Bargaining Model} 
\author{Murali Agastya\thanks{ I thank Phil Reny, Arthur Robson and Myrna 
Wooders for helpful comments. Of course, the usual disclaimer applies.} \\ 
Center for Rationality and \\ \hspace{.4 in} Interactive Decision Theory \\ 
Feldman Building \\ Hebrew Univ., Givat Ram \\ Jerusalem 91904 \\ ISRAEL \\ 
\vspace{.12 in} Email: murali@math.huji.ac.il}

\date{May 20, 1994}
 \maketitle

\begin{abstract} 
Varying quantities of a single good can be produced using at 
least two and at most $n$ factors of production. 
The problem of allocating the surplus among the factors is studied in a 
dynamic model with adaptive behavior. Representatives for the factors (called 
players) make wage demands based on precedent and ignorant of each other's 
utilities for this good. Necessary and sufficient conditions are provided 
under which the long-run equilibria coincide with the core allocations.  
A global convergence result is proved to show that players do learn to reach a
core allocation in the long run. 
Moreover, allowing for the possibility of mistakes by the players, all the 
{\em stochastically stable outcomes}  are characterized.  The main result 
shows that in the limit, these stable allocations  for a particular set of 
players converges to the allocation that maximizes the product of all the 
players' utilities over all core allocations. 
\end{abstract} 
\thispagestyle{empty} 
\end{titlepage} 

\section{Introduction}


The problem of allocating the surplus among various factors of production,  
or equivalently characteristic function bargaining  has been 
been studied in a variety of non-cooperative and/or axiomatic models.
While there can hardly be anything  indeterminate in the predictions of
axiomatic 
models, the story is quite different for the non-cooperative models with three
or more players. Many natural adaptations of the Rubinstein (1982) result in 
a variety of folk theorems\footnote{
See Sutton(1985) for a survey.
See also Perry and Reny (1991), Serrano and Vohra(1993), Chatterjee {\em et.al}
(1993),
Winter(1992).}. A problem of considerable interest then is as to which of
the many equilibria is (are) most likely to be played.

Analysis of strategic interaction places great demands on the knowledge and
rationality of the players. These demands are magnified when there are
several possible equilibria to choose from. Although it is true for more
general games, this problem is especially severe in bargaining situations 
where there are several pareto-optimal equilibria\footnote{ For example, in
Perry and Reny 
(1991), every core allocation is a subgame-perfect equilibrium of the 
negotiation process described therein.}. For, there is a paucity of 
eductive explanations to select between several pareto-optimal equilibria. 
Consequently, and only recently, the study of explicit dynamic learning 
processes through which agents learn the way to play a game has commanded 
greater attention. The idea is to see which equilibrium, if any, the players 
will learn to play under simple rules of behavior and minimal assumptions on 
their knowledge. 


 In this paper, I study the allocation problem in the 
context of a simple dynamic learning model. At each date a single good can be 
produced. 
The production possibilities in a period 
are described by a characteristic function $f$ defined on the class of all 
subsets of a finite set of factors. Here, $f(S)$ is the quantity of a single 
output that can be produced using the resources of the subset  $S$ alone.  At 
each date, one representative for each factor, henceforth referred to as a 
player, demands a certain amount of the surplus as a wage for his services.  
Demands are made simultaneously  and players are committed to these demands 
for the period. There is no assumption of common knowledge of players' 
utilities. Players use statistical data from a random sample from  recent 
history to demand a wage for the current period. 
Demands are made only in integral multiples of $\delta$, a smallest money 
unit. Moreover, players are myopic and play a best response by maximizing the 
one period expected utility. The best response rules of the above dynamic 
process, induce a stationary Markov Process referred to as Evolutionary 
Bargaining Process (EBP). This dynamic process itself is due to Young(1993a) 
and is a variation  on fictitious play. 


There is a one to one correspondence between the set of absorbing states of 
the EBP, referred to as conventions,  and the Strict Nash equilibria of a
simultaneous move demand commitment game. In this game, 
players simultaneously demand a part of the 
surplus and are committed to their demands. This is of course the original 
Nash demand commitment game. 
The difference and difficulty 
arises from the fact that unlike in Nash's original demand game where a pie is
to be split between several players, here players make demands over a general 
characteristic function. Consequently a given set of demands can perhaps be met
in several coalitions. This uncertainty regarding coalition formation coupled 
with the fact that we are dealing with {\em individual demands } and {\em not 
proposals directed towards coalitions} makes the problem interesting and 
somewhat more complex. In section~\ref{demandgame}, the game is analyzed. It 
is shown that if the technology displays increasing returns to scale in the 
{\em number of factors \footnote{Notice that this does not necessarily require
increasing returns in the extent of use of particular factor(s)}}, 
then core allocations coincide with the (strict) Nash equilibria. Needless to 
say, certain assumptions regarding coalition formation are also required. 

Section~\ref{dynamicprocess} studies the dynamics of the EBP. 
Theorem 1 presents sufficient conditions under which the only persistent 
states are absorbing. In other words, under the hypotheses of Theorem 1, 
players learn to reach a core allocation regardless of the starting point. 


Section~\ref{ssb} then studies the dynamics of the EBP by allowing for the
possibility of the players making {\em mistakes}. Players play a best
response with a very high probability but with a small but positive
probability, they commit errors. Consequently there are no longer any
absorbing states. It is possible to move from one convention to another.
However, some conventions are harder to displace than others. When the
probability of making errors is small, those conventions that are
hardest to displace that will be observed most often. These conventions are
referred to as {\em Stochastically Stable Conventions}, hereafter SSC, and
constitute the main notion of refinement of equilibria.

Theorem 3 characterizes the set of all SSCs.  There may be several such 
allocations. However, (Theorem 4) when the 
size of the smallest money unit $\delta$ shrinks to zero, the allocations in 
the SSCs,  for a particular subset of players, converge to the allocation that
maximizes the product of all the players' utilities over core allocations, a 
seeming generalization of the Nash Bargaining solution.  For the three player 
case however, one has uniqueness. 

Young (1993b) is the closest relative of this paper. Young considers a very
special technology and restricts attention to the case of two players. With
three or more players, the possibility of cycles in the dynamic process
makes analysis here much more complex. To obtain a tractable proof, I assume 
that players only make 'small mistakes'. But this in turn creates a different
technical problem. The perturbed EBP is no longer strongly ergodic. Since
the refinements in models that study naive behavior \footnote{
See for e.g., Foster and Young (1990), Kandori {\em et.al} (1993), Young
(1993a and 1993b)} are based on the convergence properties of the invariant
distributions of the perturbed process, their techniques are not immediately
applicable.


Binmore (1987) and Carlsson (1991) analyze perturbations of the original
Nash demand game and study the properties of non-cooperative equilibria of
the perturbed game as the perturbations become very small. They show that
the Pareto optimal Nash equilibria of the perturbed game converge to the
original Nash Bargaining solution. The model presented below (as does Young
(1993b)) differs from the above models in that they presuppose the
equilibrium of both the original as well as the unperturbed game. In this
model on the other hand, players reach  long-run equilibria as the
likelihood of mistakes becomes very small, without any knowledge of the
utilities of other players.

The paper is organized as follows. Section 2 presents the one shot demand
game and characterizes the set of equilibria. Section 3 studies the EBP.  
Section 4 concludes with a few examples.  Most formal proofs appear in 
Appendix A and Appendix B. 

\section{ The Demand Game}~\label{demandgame}
Let $N=\{1,2,\ldots n\}$ denote the set of factors of production. I will 
assume $n \geq 3$. The
technological possibilities are described by a non-negative function $f$
defined on the class of all coalitions of $N$. By way of interpretation, 
$f(S)$ is the total quantity of a single good that can be produced using the
services of the factors in $S$ alone. I will assume that the function is
normalized so that $f(i) = 0$ for all $i \in N$. 

Let $\Delta$ be the  half open interval $(0,f(N)]$. One representative for each
factor of production demands a wage from $\Delta $.  We will 
refer to these representatives as players. Let $\omega_i$ denote a typical 
wage demand of player $i$. Given a vector of wage demands $W$, let $W(S)$ 
denote the total wages demanded by the coalition $S$ . The demands of $S$ are 
said to be {\em \underline{compatible}} at $W$, if $W(S) \leq f(S)$. The 
demands are {\em \underline{strictly} \underline{compatible}} if the 
inequality is strict.  Let $\beta(W)$ ( $\hat{\beta}(W)$ ) denote the set of
all coalitions in 
whose demands are compatible ( strictly compatible). 

Players are assumed to make their demands simultaneously and are committed to 
their demands.  A 
player' demand is not met unless his services are used.  By the assumption of 
commitment, a player cannot have his demand met unless there is a coalition 
$S$ to which he belongs and the demands of the coalition as a whole are 
compatible. However, a particular  $W$ may be compatible in  several
coalitions. 
Thus 
even under the reasonable hypothesis that only compatible coalition(s) will 
eventually form, there is still some uncertainty whether a player's demand 
$\omega_i$ will be met. To capture this uncertainty, let 
$p_i(\omega_i|W_{-i})$ denote the probability belief that player i's demand 
$\omega_i$ is met if others demand $W_{-i}$.  This probability may be thought 
of as the result of an exogenous matching process. By allowing for $p$ to 
depend on $i$ and making assumptions directly on the $p_i$, we allow for 
possibility a certain amount of subjective uncertainty on the part of the 
players regarding this matching process. The following will be assumed of the 
matching process (or beliefs) at the very outset.           


\begin{assumption}~\label{as1} 
 Fix $W$. Then for each $i \in N$, 
\begin{enumerate}
\item If $i$  is not the member of \underline{any} coalition whose demands 
are compatible, then $p_i(\omega_i|W_{-i}) =0$. 
\item For all $\omega > \hat{\omega}$, $p_i(\hat{\omega}|W_{-i}) \geq
p_i(\omega|W_{-i}))$. 
\end{enumerate}
\end{assumption} 

Assumption~\ref{as1} is natural  enough and requires no  further comment.
Now, assuming that a player obtains zero if his demand is not met, the 
expected utility of player $i$ if he demands $\omega_i$ when others have
demanded 
$W_{-i}$ is \begin{equation} \label{nasheqblm}U_i(\omega_i|W_{-i}) =
p_i(\omega_i|W_{-
i})v_i(\omega_i) + [1-p_i(\omega_i|W_{-i})]v_i(0) \end{equation} where the
function 
$v_i$ determines the utility derived from consuming $\omega_i$. I will assume
that 
$v_i$ is a strictly increasing, continuously differentiable and concave 
function. Finally, assuming that players maximize expected utility  and 
normalizing $v_i(0) = 0$, we have the following usual definition of a strict 
Nash equilibrium: 

\begin{definition}
A vector of wage demands $W^*$ is a strict Nash  equilibrium iff
$$
U_i(\omega_i^*|W_i^*) > p_i(\omega_i|W_{-i}^*)v(\omega_i). 
$$
if $\omega_i^* \neq \omega_i$, for all $i \in N$.
\end{definition}

In this paper, Nash equilibrium will always be taken to mean the equilibrium
defined above. 

\begin{definition}
A vector of wage demands $W$ belongs to the core of the technology $f$, 
\begin{enumerate}
\item  $f(N) = W(N)$. {\em ({\em Feasibility})} 
\item  For all $S \subseteq N$, $W(S) \geq f(S)$. {\em ({\em No blocking sub-
coalitions.})}
\end{enumerate}
Let ${\cal C}(f)$ denote the set of all core allocations. 
\end{definition}

At a first glance it seems somewhat intuitive that the core and the set of
Nash equilibria are related. However, recall that in this model, players
make demands and not proposals directed towards a particular coalition. When
a player makes a proposal, not only does he seek a payoff for himself, but
also specifies a payoff for a subset of other players as well. Hence,
implicitly he also suggests which coalition is to form. In the present
model, however, players independently and simultaneously make their demands.
Hence it it possible that individual players may make demands that are not 
pareto-optimal. At this point it may not pay for any of them to deviate by 
responding with higher demands as it will, by Assumption 1, will reduce the 
likelihood with which it is met. Similarly, it may also be the case that the 
players may demand wages much higher than what the grand coalition can afford 
but expect to obtain these in smaller coalitions. 

\begin{example}
{}~\label{eg1a}{\em Let $N = \{1,2,3\}$, $f (ij)=1$ for all $i,j$ and $f
(123)=2$.  Let $v_i=v$ for all $i$. Assume that the matching process is 
such that for a given $W$, all coalitions in $\beta(W)$ are equally likely. 
It may be checked, that this matching process is consistent with 
Assumption 1. Now consider 
$W^*=(\frac{1}{2},\frac{1}{2},\frac{1}{2})$.  Since the demands of all the 
three two player coalitions are compatible as well as those of the grand 
coalition, the payoff of player $i$ on demanding $\frac{1}{2}$ at $W^*$ is 
\begin{equation}
U_i(\frac{1}{2}|W^*_{-i}) = \frac{3}{4}v(\frac{1}{2}) 
\end{equation}}

{\em Now consider a deviation $\hat{\omega_1}$ by $1$. Clearly, since the set
of 
compatible coalitions does not change, a demand strictly less than $1/2$ is 
strictly dominated by $1/2$. Furthermore, any demand $\hat{\omega} > 1/2$, 
is strictly dominated by a demand of 1, unless $\hat{\omega}=1$. 
Indeed, if he responds with $1$, the new set of demands are compatible in the 
$\{2,3\}$ coalition and $N$. Since each is equally likely to form, the payoff 
is 
\begin{equation} 
U_i(1|W^*_{-i}) = \frac{1}{2}v(1)
\end{equation}} 

{\em For an appropriately concave $v$, the demand  $1$ can be seen to be 
strictly dominated by $1/2$. Since all players are symmetric, we have shown 
that on the one hand $(1/2,1/2,1/2)$ that is not in the core is a Nash 
equilibrium while on the other hand, $(1,1/2,1/2)$, an element of the core is

not a Nash equilibrium. } \end{example} 

The above example demonstrates two things. First, there
is no immediate relation between core allocations and Nash equilibria.
Second, Assumption 1 is not sufficient to rule out rather unintutive
outcomes. Note that in the above example, every feasible coalition is assumed 
to form with a strictly positive probability. Thus, the problem is not one of 
beliefs which assign probability zero to coalitions that can form. In the 
example, players believe that the two player coalitions form although no 
player is "hurt" by the formation of the grand coalition. Since the players 
are assumed to be committed to their demands, it does not seem unreasonable to
assume that ties will be broken if favor of the larger coalition. Assumption 2
below is a weaker statement of this intuition. 

\begin{assumption}
\label{as2} Fix $W$. Suppose that there is a set $S^*$ such that
\begin{itemize}
\item The demands of $S^*$ are compatible, 
\item For every coalition $T$ whose demands are compatible, the demands of 
$T \cup S^*$ are also compatible. 
\end{itemize}
Then, $p_i(\omega_i|W_{-i}) =1$ for all $i \in S^*$. 
\end{assumption}

The demands of $S^*$ are compatible. However, the demands of a different 
coalition $T$ may also be compatible and this might in fact require the 
services of a subset of players in $S^*$. 
Assumption~\ref{as2} then requires that if it never "hurts" anyone to
accommodate 
the demands of the remaining players in $S^*$, then all the players in $S^*$ 
will be matched. 


\begin{proposition}
Under Assumption~\ref{as1} and Assumption~\ref{as2}, every element of the core
is a Nash equilibrium. 
\end{proposition} 

{\bf Proof: }Let $W^*$ be an element of the core. Hence $W^*(N) = f(N)$.
Take $S^* = N$ in Assumption 2. It then follows that $p_i(\omega_i^*|W_{-i}^*)
=
1 $ for all $i \in N$. A deviation by any player $i$ by demanding $\omega_i >
\omega^*_i$, implies the demands are not compatible in any coalition
containing
him. Consequently, $p_i(\omega_i|W_{-i}^*) = 0$. Hence, it is not profitable
to
ask for more than $\omega^*_i$.

The only reason that a player may ask for less than $\omega^*_i$ 
if it can be met with a higher probability than $\omega^*_i$. But since $
p_i(\omega^*_i|W^*_{-i}) = 1$, $\omega^*_i$ is the unique best response.
$\Box$

The following example shows that the converse of Proposition 1 does not hold 
without further restrictions.

\begin{example}
\label{eg1}
{\em Let $f (12)=f(13)=300, \ f (123)=302$.  Assume that $v_i(\omega) =\omega$.
 Under the 
hypotheses of Proposition 1, all core allocations are Nash equilibria.  In any
core allocation, neither player $2$ nor $3$ gets more than $2$. 

Now note that for all $\omega > 4$, all demands of the form $(300-\omega,
\omega,\omega)$ 
are compatible in exactly the $\{1,2\}$ or $\{1,3\}$ coalitions. Assume that 
each of these are equally likely to form.  Again, this is consistent 
with Assumptions 1 and 2. It is straightforward to check that the above demands
constitute a 
Nash equilibrium.  Indeed, the highest payoff of either player 2 (or 3) can 
get from deviation is $2$ which is strictly smaller than his expected 
payoff with the current demands. } 
\end{example} 


\begin{proposition}
Under Assumption~\ref{as1} and ~\ref{as2}, if the technology is convex, i.e., 
$$
f(S) + f(T) \leq f(S \cup T) + f(S \cap T) \mbox{\hspace{0.5 in} for all $S,T
\subseteq N$.}
$$
then a Nash equilibrium demand vector $W^*$ is in the core of the technology.
\end{proposition}


\proof The proof involves showing that if the demands of a  certain coalition 
are strictly compatible at $W$, then it is possible to find a coalition 
$S^*$ such that whenever $T$ is compatible, $S^* \cup T$ is strictly 
compatible.  Thus, a player in this coalition can increase his demand by a 
positive amount and yet ensure that it is met with probability one, by virtue 
of Assumption~\ref{as2}. 
Hence, at a Nash equilibrium, no coalition's demands are strictly 
compatible. However, for each player, there must be at least one coalition in 
which his demands are compatible. This is sufficient to now conclude that the 
demands must be feasible in the grand coalition.  Details appear in Appendix 
A. 

\section{ Dynamic Adjustment Process: Adaptive Play}~\label{dynamicprocess}

In this section I describe a dynamic
process. I consider the case where the
demand game of section 2 is played repeatedly at discrete dates. Players are
assumed to be myopic, have a bounded memory and make demands based on
precedent.

More precisely, time is denoted by, $t = 1,2, \ldots $. At each date $t$,
players simultaneously demand a wage. To steer clear of the complications
that arise from infinite dimensional strategy spaces, the strategy spaces
are discretized. Towards this end, I first assume that each $f(S)$ is a
rational 
number, say $p_S/q_S$. Let $M = \prod_{S \subseteq N} q_S$.  For a given 
positive integer $p$, the number let $\delta = 1/ (10^pM)$ is the 
precision\footnote{ The dependence of $\delta$ on $p$ is suppressed for 
conserving on notation.} 
of the model. Players will be assumed to demand in integral multiples of 
$\delta$.   Let $\Delta_{\delta}$ denote the subset of $\Delta$ consisting of 
integral multiples of $\delta$.  A central concern of the paper is the behavior
of 
model as $\delta \rightarrow 0$ (or equivalently large values of $p 
\rightarrow \infty$). 

Let ${\cal C}_{\delta}(f)$ denote the set 
of core allocations with each $\omega_i \in \Delta_{\delta}$. It is important 
to note that with the above method of discretization, ${\cal 
C}_{\delta}(f)$ can be defined exactly like ${\cal C}(f)$. That is, we do not 
encounter problems of demands in ${\cal C}_{\delta}(f)$ being strictly 
compatible in coalition because players are allowed to demand from 
$\Delta_{\delta}$ only. 

In the sequel, $\omega_i \in \Delta_{\delta}$ unless specified otherwise. 

Let $\omega_{it}$ and $W^t$ denote a typical demand of a player $i$ and the
vector of wage demands at date $t$ respectively. The complete {\em history}
up to and including period $t$ is a sequence of demands $W^1, W^2,\ldots W^t$.
Consider a typical player $i$ who has to make a demand at date $t$. Players
have no knowledge or a prior regarding the utility of the other players. To
determine an optimal response, they have to rely on the historical records.

Fix $\alpha_1 \leq \alpha_2 \leq \dots \alpha_n$ where each $\alpha_i$ is a 
rational number. An integer $m$ is said to be {\em admissible} if $\alpha_im$ 
is an integer for all $i$. Let $k_i = \alpha_im$

Fix an admissible $m$. Player $i$ samples at random  a fraction of at most
$\alpha_i$  of the last $
m $ records, {\bf s}$= (W^{t-m+1},W^{t-m+2},\ldots W^t)$. It is important
for this model that every subsample at most of size $\alpha_im$ is sampled 
with a positive probability. However one need make no assumptions on the 
relative probabilities with which different parts of the history are sampled. 
The variable $\alpha_i$ is a measure of player i's {\em information}. 


Recall that the history up to date $t$ consisted of only past demands. In
particular, the history was silent as to which coalition has formed when
similar demands were in place. Consider a player who has picked the sample
$\sigma_i = (W^1,W^2,\ldots
W^{k_i})$. Player $i$ believes that his demand $\omega$ is met with probability
$F_i(\omega|\sigma_i)$ where 
$$
F_i (\omega|\sigma_i)=\frac{1}{k_i}[p_i (\omega|W^1_{- i})+p_i
(\omega|W^2_{-i})+ \dots
+p_i (\omega|W_{-i}^{k_i})]. 
$$

The expected payoff of player $i$ upon demanding $\omega$ following the sample
$\sigma_i$ is 
\begin{equation}
\label{convention}F_i (\omega|\sigma_i)v_i(\omega)
\end{equation}

Hence if the state at time $t$ is {\bf s}, the wage demand at $t+1$ must
satisfy 
\begin{equation}
\label{br}\omega_{i(t+1)} = \arg\max F (\omega|\sigma_{it})v_i(\omega)
\end{equation}
for some sample $\sigma_{it}$ of size $k_i$ from the state {\bf s}. If there
are several values of $\omega$ that solve Eq. (~\ref{br}) above, then each of
them is
played with a strictly positive probability.

The above model is similar to fictitious play in the sense that players make
their demands naively based on empirical distributions. Unlike in fictitious
play, where a player samples the entire history, in the above process, a
player samples only a fraction of the most recent history. This process has
been termed {\em adaptive play} by Young (1993a). Since players are
sophisticated enough to actually play a best response at each date, it seems
that the above behavioral process, as is fictitious play, makes sense only
if one assumes that players do not know each other's utility functions.

The response rules of the players (determined by Eq.~\ref{br}), determine a
stationary Markov chain. The state space $\Omega$ consists of
all sequences {\bf s} of length $m$. Each entry of {\bf s}, is a vector of
wage demands by the agents. 

Let $p^*_i (\omega_i|\mbox{{\bf s}})$ denote the probability with which player
$i$ 
demands $\omega_i$ in the state {\bf s}. For each $i$, $p^*_i$ is a best
response 
distribution, i.e., $p^* (\omega_i| \mbox{ {\bf s}})>0$ iff $\omega_i$ solves 
equation~\ref{br} for some sample $ \sigma_i$ in {\bf s}. 

For a state {\bf s}$= (W^1, W^2, \ldots W^m)$, a state {\bf s$^{^{\prime}}$}$
= (W^2,W^3,\ldots,W^m,W)$ is said to be its successor. The probabilities are

$$
P^0_{\mbox{{\bf ss$^{^{\prime}}$}}}= \left\{
                           \begin{array}{ll}
                            \prod_{i \in N}p_i^* (\omega_i|\mbox{{\bf s}}) &  
                           \mbox{if {\bf s$^{'}$} is a successor of {\bf s}} \\
                           0 & \mbox{otherwise.}
                           \end{array}
                           \right.
$$


Let $P^0$ denote the matrix of the above transition probabilities. As in
Young (1993b), the above Markov process, with the state space $\Omega$, and
transition probability matrix $P^0$ is said to be an {\em evolutionary
bargaining process} ( EBP) with memory $m$, precision $\delta$,
information parameters $\{\alpha_i\}$ and best reply distributions
$\{p^*_i\}$.

\subsection{ Conventions}

\begin{definition}
{\em (Convention)} A state {\bf s} is said to be a convention iff
it is an absorbing state of the evolutionary bargaining process, {\em i.e.,} 
$P^0_{\mbox{{\bf ss}}} = 1$.
\end{definition}

{}From a given state, there is a positive probability of reaching only an
immediate successor. Thus, if {\bf s } is a convention and is the state at
time $t$, then {\bf s } is the state at time $t+1$ as well. For this to be
true, it is clear that {\bf s } must be a sequence of $m$ identical demands.
Now, let {\bf w} denote the state in which each entry is the vector of
demands $W$. Suppose that the process is in state {\bf w} at time $t$. Then
regardless of which sample player $i$ picks, the probability with which his
demand will be met is given by 
\begin{equation}
F_i (\omega|\sigma)=p_i (\omega_i|W_{-i}) 
\end{equation}

An easy application of Eqs.~\ref{nasheqblm}, \ref{convention}, ~\ref{br} and
yields Proposition 3 below.

\begin{proposition}
A state {\bf w} is a convention iff  $W$ is a Nash equilibrium of
the demand game.
\end{proposition}

Proposition 3 identifies the isomorphism between the conventions and Nash
equilibria of the one shot demand game. Once the players reach a convention,
they are locked into playing the game in a particular way. Thus, by
establishing the global convergence from an arbitrary initial state to a
convention, we describe a particular way in which players learn to play the
Nash equilibrium of the demand game. This is the object of the next
subsection.

\subsection{Convergence of the EBP}

In a Markov Process the set of all aperiodic states are either transient or
persistent. If the set of all persistent states are absorbing, then the EBP
converges from an arbitrary state to some absorbing state, with probability 
one. Theorem~\ref{thm1} below presents sufficient conditions under which the 
only persistent states are conventions. 

\begin{theorem} 
\label{thm1} Suppose that $\alpha_i \leq \frac{2}{(n^2-n+4)}$ for all $i$. 
Then the EBP converges with probability one to a convention.
\end{theorem}

It is useful to point out that the bound in the Theorem is only a sufficient 
condition that is independent of $f$. For particular cases it is possible to 
give a much sharper bound. For example, for the standard Nash Bargaining game 
where $f(N) = 1$ and $f(S) = 0$ if $S \neq N$, $\frac{k_i}{m}\leq \frac{1}{3}$
is 
sufficient. The same bound is also sufficient for the three player case with 
an arbitrary $f$. This is proved below. However, for all subsequent results 
I assume that  the hypothesis of Theorem 1 is met.  The proof of the general 
case appears in Appendix B. 


\proof Assume, for ease of exposition, that $\alpha_i=\alpha \leq 1/3$ for all
$i$. Let
the process be in state {\bf s} at date t. Let $\sigma$ denote the last $
k$ elements of this state. Let $W^1$ denote a best-response vector to the
sample $\sigma$. Since every sample of size $k$ has a positive probability
of being sampled, there is a positive probability of observing $W^1$ at date 
$t+1$. In fact, there is a positive probability of observing a run of $W^1$
between $t+1$ and $t+k$. If {\bf w$^1$} is a convention, we are done. For,
between $t+1$ and $t+2k$ all the players will sample the records containing $
W^1$ alone and respond with $\omega^1_i$. Hence there is a positive probability
$
p $ of reaching a convention in $m+2k$ periods. So the probability of not
reaching a convention in $r (m+2k)$ periods at most $(1-p)^r$, which goes to
zero as $r \rightarrow\infty$. Suppose that $W^1$ is not in the core but is
compatible in at least one of the three two player coalitions say $\{12\}$.
Then, between $t+k+1$ and $t+2k$, there is a positive probability that
players 1 and 2 will continue to sample $\sigma$, while player three will
sample the records consisting of $W^1$ alone. His best response \footnote{
It may be verified that since the technology is convex, $f (123)-\omega_i$ is
not
feasible in either of the smaller coalitions that contain player 3.}, by
virtue of Assumption~\ref{as2} is $f (123) - \omega_1 - \omega_3$. Hence,
between
period $t+k+1$ and $t+2k$, there is a positive probability of seeing a run
of $W^2= (\omega_1,\omega_2,f (123)-\omega_1-\omega_2)$.

It is useful to note that until now we needed a history of length at most $
3k $. From now on will make use of samples of size $k$ that appear from
dates $t+k+1$ onwards.

Now between $t+2k+1$ and $t+3k$, there is a positive probability all the
three players will sample demand $W^2$. The best response of player $3$
continues to be $f (123)-\omega_1-\omega_2$ while player $1$ and player $2$
must
demand $f (12) - \omega_2$ and $f (12)- \omega_1$ respectively. Hence we will
see a
run of $W^3= (f (12)- \omega_2,f (12)-\omega_1, f (123)-\omega_1-\omega_2)$ for
$k$ periods with
a positive probability.

Between $t+3k+1$ and $t+4k$, there is a positive probability of players 2
and 3 continuing to sample demands of $W^2$ alone while player 1 samples $
W^3 $. The best responses of players 2 and 3 remain unchanged while that of
player 1 is now $\omega_1$. Hence, there is a positive probability of seeing a
run of $W^4= (\omega_1,f (12)-\omega_1, f (123)-\omega_1-\omega_2)$.

Finally, there is a positive probability of all three players sampling the
most recent $k$ records consisting of $W^4$ alone. The best response of
players 1 and 2 continue to be $\omega_1$ and $f (12)-\omega_1$ respectively
while
player three must now demand $f (123)-f (12)$. It may be verified that the
demands $W^5= (\omega_1,f (12)-\omega_1,f (123)- f (12))$ is an element of the
core.
Now we repeat arguments similar to those found in the first paragraph of
this proof to conclude that we the EBP converges with probability one to a
convention if the information is less than or equal to $1/3$.

The only other case to consider is when $W^1$ is feasible and is strictly
compatible in the grand coalition but is not compatible in any of the
smaller coalitions. In this case, it is clear that $W^2$ above is in the
core.


\subsection{ Stochastically Stable Conventions}

\label{ssb}

This section studies the same dynamic process as in the previous section but
allows for the possibility of players making  mistakes, much in the spirit of
the models in Kandori, et. al (1993), Young (1993a) and Young (1993b). The aim
of this section is to introduce the notion of stochastic stability and provide
a characterization of such states. I start with a few definitions. 
Fix the sample sizes $k_i$ and the memory $m$.

\begin{definition}
{\em (Mistake)} Suppose that the EBP is in state {\bf s}$=
(W^{t-m+1},W^{t-m+2},\ldots,W^{t})$ at time $t$ and {\bf s$^{^{\prime}}$}$=
(W^{t-m+2},W^{t-m+3}, \ldots W^t,W)$. The transition {\bf s} to {\bf s$
^{^{\prime}}$} is said to involve exactly one mistake, if there is exactly 
one player, say $i$, for whom there is no sample in {\bf s} of size $k_i$ for 
which $\omega_i$ is a best-response, i.e., $p_i^* (\omega_i|\mbox{{\bf s}})=0$.
 \end{definition} 

Clearly  the number of mistakes involved in a transition from a state to its
successor 
is between zero and $n$, depending on the number of players that have made 
errors.

Suppose that the probability with which player $i$ makes a mistake is given
by $\epsilon \lambda_i >0$. Conditional on the fact that player i has made a
mistake, let $q_i (\omega_i|\mbox{{\bf s}})$ be the probability with which he
demands the wage $\omega_i$ in state {\bf s}. Clearly, $q_i$ is different from
$
p^*_i$. The parameter $\epsilon$ is the absolute probability with which
players make mistakes and $\lambda_i/\lambda_j$ is the relative probability
of players $i$ and $j$ making mistakes. The event that player $i$ makes a
mistake is assumed to be independent of the event $j$ makes a mistake.

Now suppose that the process is in state {\bf s} at time $t$. The
probability that exactly the members in the coalition $S$ make mistakes is $
\epsilon^s (\prod_{i\in S }\lambda_i) (\prod_{i \not\in S}
(1-\epsilon\lambda_i))$. Conditional on this event, the transition
probability of moving from a state {\bf s} to a state {\bf s$^{^{\prime}}$}
is

$$
Q_{ss^{^{\prime }}}^S=\left\{ 
\begin{array}{ll}
\prod_{i\in S}q_i(\omega_i|\mbox{{\bf s}})\prod_{i\not \in S}p_i^{*}(\omega_i|
\mbox{
{\bf s}}) & \mbox{ if {\bf s$^{^{\prime }}$} is a successor of {\bf s}} \\  
& \mbox{and the demands to the far right are $W$} \\ 
  &  \\
0 & \mbox{otherwise.} 
\end{array}
\right. 
$$

If none of the players make mistakes, then the transition probability of
moving from state {\bf s} to a state {\bf s$^{^{\prime}}$} is given by the
earlier transition probabilities $P^0_{ss^{^{\prime}}}$. This event has the
probability $\prod_{i =1,2,3} (1-\epsilon\lambda_i)$.

Allowing for the possibility of mistakes, we now obtain a new Markov process
with the same state space $\Omega$ as before but with the transition
function: 
$$
P^{\epsilon}_{ss^{^{\prime}}}= \left (\prod_{i\in N} (1-
\epsilon\lambda_i)\right)P^0_{ss^{^{\prime}}} + \sum_{S \subseteq
N}\epsilon^{|S|} (\prod_{i\in S }\lambda_i) (\prod_{i \not\in S}
(1-\epsilon\lambda_i))Q_{ss^{^{\prime}}}^S. 
$$

Let $P^{\epsilon}$ denote the above matrix of transition probabilities. In
most models similar to the one presented here, including Kandori, et. al.
(1993) and Young (1993a) and Young (1993b), it is assumed that when players
make mistakes, every feasible strategy is played with a strictly positive
probability. Mistakes then, constantly perturb the process away from a
convention. Now there are no absorbing states. However, since the transition
probabilities of the perturbed process converge to those of the unperturbed
process as $\epsilon$ converges to zero, for small values of $\epsilon$, the
perturbed process continues to be attracted to conventions, without actually
settling down. Which of the conventions that the process stays at for the
most part depends on the number of mistakes that are required to move it far
enough to a state from which it would gravitate toward a different another
convention. Hence, in the long run, if when the probability of mistakes is
very small, the convention that is observed most of the time will be the one
that requires the largest number of mistakes to displace.

The asymptotic (or long run) behavior of a Markov Process is captured
completely by its invariant distributions. When one assumes that there is a
positive probability of {\em every} strategy being played, the perturbed
process is irreducible. It is easy to show that each of the states is
aperiodic as well. Hence there is a unique invariant distribution $
\mu^{\epsilon}$ for the perturbed process, for each $\epsilon > 0$. For a
state {\bf s}, $\mu^{\epsilon}_s$ is the limit of the the relative frequency
with 
it is  observed in the first $t$ periods as $t \rightarrow \infty$. Since
the invariant distributions (perhaps along a subsequence), converge to the
invariant distribution of the unperturbed process, the conventions that are 
observed most often, when the probability of mistakes is small, are those in 
the support of the limit of the invariant distributions of the perturbed 
process (which is of course an invariant distribution of the unperturbed
process).   
This motivates the following refinement of the set of conventions, first
introduced by Foster and Young (1990).

\begin{definition}
{\em (Stochastically Stable Convention)} A convention {\bf s} is
stochastically {\em stable} if $\lim_{\epsilon \rightarrow
0}\mu^{\epsilon}_{s}$ exists and is positive. A state is strongly stable, if 
$\lim_{\epsilon \rightarrow 0}\mu^{\epsilon}_{s} =1$.
\end{definition}

Identification of the SSC(s) is considerably simplified by a certain 
equivalence theorem initially due to Friedlin and Wentzell (1984) and adapted 
for finite processes by Young (1993a). In order to introduce this, certain 
definitions are required. 

\begin{definition} {\em ({\bf w-}tree) } Fix a convention {\bf w}. A {\bf 
w}-tree is a directed graph with the set of conventions as its vertices such
that from 
each convention {\bf w$^{'} \neq$ w}, there is a unique path directed to {\bf 
w } and there are no cycles. 

\end{definition} 

\begin{definition}
{\em (Resistance)} Let {\bf s$^{^{\prime}}$} be a successor of {\bf s}.
The resistance between these two states, denoted by {\bf 
r(s,s$^{^{\prime}}$)}, is the minimum number of mistakes required in the one
period
transition {\bf s $\longrightarrow$ s$^{^{\prime}}$}. 
Similarly, for any two states {\bf s$^1$} and {\bf s$^2$}, {\bf r(s$^1 $,s$ 
^{2}$)} is the minimum number of mistakes required to reach {\bf s$^2$} from 
{\bf s$^1$} through a sequence of one period transitions. 
\end{definition} 

The resistance of a {\bf w}-tree  is naturally defined as the total 
resistance of each of its edges.  Let ${\cal T}_w$ denote the set of all {\bf 
w}-trees. 

\begin{definition} {\em (Stochastic Potential)}  The  stochastic potential of 
a convention {\bf w } is the least resistance among all {\bf w}-trees:
$$
\gamma(\mbox{{\bf w }}) = \min_{T \in {\cal T}_w} 
\sum_{\mbox{{\bf(w$^1$,w$^2$)}} \in T}   \mbox{{\bf r(w$^1$,w$^2$)}}.
$$
\end{definition}


\noindent {\bf Theorem A} (Young (1993a)) {\em The sequence of stationary
distributions 
$\mu^{\epsilon}$ converge to a stationary distribution $\mu^0$ of $P^0$ as 
$\epsilon \rightarrow 0$. Moreover, a state {\bf s } is stochastically stable 
iff {\bf s = w } is a convention and has the minimum stochastic potential 
amongst all conventions. } \\ 


In order to allow for a parsimonious construction of the tree of minimum 
stochastic potential, I make the following assumption.

\begin{assumption}\label{as3} 
Players only make mistakes that are a distance $\delta$ away from a best 
response.  That is, if {\bf s } is the state at time $t$, then for every $i$, 
$q_i(\omega_{i,t+1}|\mbox{ {\bf s }}) > 0$ iff there is a $\hat{\omega}$ such
the 
\begin{itemize}
\item $p_i^*(\hat{\omega}|\mbox{{\bf s }}) >0$ 
\item $|\hat{\omega} -\omega_i| \leq \delta$. 
\end{itemize}
\end{assumption}

Young (1993b) constructs a tree of minimum stochastic potential without 
resorting to any assumptions on players' mistakes in the two player case. With
three or more players, however, the allocation space has is at least two 
dimensional. Consequently, one has to contend with the possibility of cycles 
that do not arise in Young's compact, one dimensional space. 

Two further remarks are in order. First, note that Assumption~\ref{as3} is 
made on the players' strategies  rather than the payoffs. However, when in a 
convention, say {\bf w$^*$}, the unique best-response is $\omega_i^*$. With the
above assumption, player $i$ is assumed to demand for $\omega^*_i + \delta$ and
$\omega^*_i-\delta$ as well as $\omega_i^*$ with a positive probability. In
terms of 
payoffs, $\omega^*_i -\delta$ will constitute as "small mistake" as it will be
met 
for sure while $\omega^*+\delta$ which will be rejected for sure is a "large 
mistake". So, in a sense, Assumption~\ref{as3} allows only for extremes in 
terms of payoffs. Of course, nothing is implied in states that are not 
conventions as the payoff functions in this model are not continuous in 
strategies.


Second, recall that the definition of stochastic stability was based on the 
assumption 
that the perturbed transition probability matrix had a unique invariant 
distribution. With the above assumption, it is no longer clear that the 
$P^{\epsilon}$ is irreducible. Consequently, it is now not immediate that a 
unique invariant distribution exists and hence stochastic stability may be an 
ill-defined concept. Theorem 2 below, which uses a special bound on the 
extent of players' information establishes the validity of the solution 
concept. 


\begin{theorem}~\label{thm2} Under Assumption~\ref{as3},  $P^{\epsilon}$ admits
a unique invariant distribution  for every $\epsilon >0$ if $\alpha_i \leq 1/4$
for 
all $i$. Moreover, the 
support of this invariant distribution contains the set of all conventions. 
\end{theorem} 

\proof See Appendix B. \\ 


The following notation is useful: $$ g(W)  =  \sum_{i 
\in N} \alpha_i\log(v_i(\omega)) $$ $$ g_i(\omega)  = \frac{\partial \alpha_i 
\log(v_i(\omega))}{\partial \omega} $$ $$ r_i(\omega,\delta) = \alpha_i[1-
\frac{v_i(\omega -\delta)}{v_i(\omega)}] $$ $$ r(W,\delta) = \min_{i \in N} \ 
r_i(\omega_i,\delta). $$ 

\begin{theorem}\label{thm3} 
There exists a level of precision $\delta^*$ 
such that for all  $ 0 < \delta \leq \delta^*$ a  convention {\bf w}$^{\delta}$
is stochastically stable for every
admissible $m$ iff $W^{\delta}$ maximizes the function $r(.,\delta)$ over all

$W \in {\cal C}_{\delta}(f)$. 
\end{theorem} 

\proof (Sketch) Starting from a convention {\bf w}, The function 
$[mr(W,\delta)]$ is the minimum number of mistakes that are required before 
some player has a best response that is different from that in the convention 
( Corollary B.1 ). Hence {\bf r(w,w$^{'}$)} cannot fall below  
$[mr(W,\delta)]$.  The proof then involves showing that one can construct a 
{\bf w}$^{\delta}$-tree that involves exactly $[mr(W,\delta)]$ mistakes at 
each edge, $\mbox{{\bf w}} \rightarrow \mbox{{\bf w}}^{'}$. The resistance of 
any other {\bf w}-tree must then exceed that of this particular {\bf
w}$^{\delta}$ tree by at 
least $[mr(W^{\delta},\delta)] - [mr(W,\delta)]$. For large enough m, if $W$ is
not 
another maximum of $r(.,\delta)$, then by Theorem A,  {\bf w} cannot be 
stochastically stable. Details appear in Appendix B. 


Theorem~\ref{thm3} extends the results obtained by Young (1993b). The 
result here is weaker than in Young (1993) in two ways. First, an upper bound 
on $\delta$ is required. Second, there can be several stochastically stable 
conventions leading to an indeterminacy in the allocation for a subset of 
players even when $\delta$ converges to zero. 

\begin{example}{\em Let $N = \{1,2,3,4\}$, $f(12)= 100, f(N) =101$ and $f(S) =
0$ otherwise.  
In every core allocation, $\omega_1 + \omega_2 \geq 100$. Hence, 
$r(W,\delta) = \min \{ r_1(\omega_1,\delta),r_2(\omega_2,\delta)\}$ for  a 
core allocation $W$ and attains a maximum when $\omega_1 = \omega_2 = 50$. By 
Theorem~\ref{thm3}, every convention {\bf w} with $W= (50,50,\omega,1-\omega)$
with $ \delta \leq  \omega \leq 1 - \delta$ is stochastically stable.  } 
\end{example}


\begin{theorem}
 Let $W^*$  maximize  $g(.)$ over ${\cal C}(f)$.  Let $K \subset N$ be 
the set of players such that $i \in K$ implies $g_i(\omega_i^*) \leq 
g_j(\omega_j^*)$ for all $j \in N$.   For every every $\epsilon > 0$, there is
a $\delta^* > 0$ such that for all $0 < \delta < \delta^*$, if {\bf 
w}$^{\delta}$ is stochastically stable, then  
$$
\mid \omega^{\delta}_i - \omega^*_i \mid < \epsilon,
$$
for all $i \in K$. 
 \end{theorem} 

\proof See Appendix B.


\section{Some examples}


\subsection{Non-Convex Technology}

Even when the technology is not convex, it is still relatively easy to
compute the minimum number of mistakes that are required before some player
has a best response different from the conventional one. These turn out to
be functions that look like $r(.,\delta)$. But now, this minimum number of
mistakes is no longer sufficient to lead one out of the domain of attraction 
of the convention as the following example demonstrates.  

Note that the following 
example  may appear somewhat terse for the reader who has skipped Appendix B. 


\begin{example}
\label{eg2} {\em In Example~\ref{eg1}, set $\alpha_i = 1/4$ for all $i$. The
minimum number of mistakes $\hat{r}_{\delta} (\mbox{{\bf w}})$ before which
some player has a best response different from the conventional demand in a
convention {\bf w} is
$$
\frac{4}{\delta}\hat{r} (W,\delta) =  \left\{ 
\begin{array}{ll}
\min_{i=1,2,3} \frac{1}{\omega_i} & \mbox{ if  $W$ is in the core} \\  &  \\ 
\min \{ \frac{2}{\omega},\frac{1}{300-\omega}\} & 
\mbox{for a convention such as $W=  (300-\omega,\omega,\omega)$ } 
\end{array}
\right.
$$
\noindent where $\omega > 4$. This bound may be obtained by
constructing arguments similar to those found in Appendix B. For elements in 
the core, $\hat{r}(.,{\delta})$ coincides with $r(.,\delta)$ that appears in 
Theorem 3. Among the set of conventions that are not in the core, the fewest 
number of mistakes required before player 1 demands a $\delta$ less is 
$2/\omega_1$, whereas for players 2 and 3 the minimum number of mistakes 
continues to be $1/\omega_i$. This is because for player 1 to demand $\delta$ 
less as a best response in a convention such as $(300-\omega,\omega,\omega)$ ,
we require both player 2 and player 3 to make the mistake of asking for  
$\omega +\delta$. Hence, the correction of 2 for player 1.  

The least player 1 obtains in the core is in the allocation $(298,2,2)$. 
In the convention involving these demands, player 1 requires the minimum 
number of mistakes to be the first player to have a best response different 
from 298. This corresponds to the case where both players 2 and 3 have 
demanded $298 -\delta$ for some $L^*$ periods, determined by 
$\hat{r}_{\delta}$. To this, the best response of player 1 is to ask for 
$298+\delta$.  Suppose now, that from this point on, say $T$, no further 
mistakes occur. Assume, without loss of generality, that all mistakes actually
occurred in the last $T-L^*$ dates. 

 It is easy to see, that from this point onwards, the demands of player 1 will
not be anything other than $298$ or $298-\delta$ while those of players 2 and 
3, will not be anything other than $2$ or $2+\delta$. 

Now, consider a sample of m/4 in which every demand of player 1 is $298-
\delta$, while that of player 3 is $2$ in $m/4-1$ entries. The one demand is 
$2+\delta$. To this sample, a demand of 2 by player 2 will be a unique best 
response if 

\begin{eqnarray*}
2 & > & \frac{ (m/4 - 1)}{m/4} (2+\delta) +
\frac{1}{m/4} (2+\delta)/2 \mbox{\hspace{  .2in} or}  \\ 
2 & > &  (1 - 2/m) (2+\delta) 
\end{eqnarray*}


The last inequality holds for an appropriate $\delta$. Thus, the only sample 
for which $2+\delta$ is a best-response is if player 3 always has asked for 
$2$ and player 1 always has asked for $2-\delta$. Since $(298-\delta, 
2+\delta,2)$ is not a convention, the process eventually returns to 
$(298,2,2)$ in the absence of further mistakes. }
\end{example}

The problem in this example appears due to the fact that the set of
conventions is not connected. When $\delta$ is small, the minimum number of 
mistakes required to lead to a best response is not sufficient to lead away to
a different convention. Indeed, the number of mistakes required to 
required to reach a convention involving demands of the form
$(300-\omega,\omega,\omega)$, 
starting from a demand in the core, turns out to be very large. This is 
because we require a series of mistakes on the part of players 2 and 3 each of

them demanding a $\delta$ higher at each instant. On the other hand, the 
number of mistakes required to reach an element of the core from the 
convention with demands $(296,4,4)$ is at least $298m/[4 (298+\delta)]$. For a
fixed $\delta$, this is a large number. Hence, one cannot obtain a bound 
independent of $\delta$ and $m$. Questions of existence and characterization 
of stochastically stable conventions for examples such as these are left open 
as possibilities for future research. 

\subsection{Competition and Evolution}

The competitive allocations of an economy continue to be the benchmark for
comparing those obtained under other mechanisms. A frequent criticism of the
Walrasian mechanism is the absence of a dynamic story of adjustment 
\footnote{See Gale (1987a and 1987b) however.} The function $f$ can be thought
of as
being the game generated by an endowment economy in which agents have
quasilinear preferences for money. The EBP can then be thought of a process
of adjustment to equilibrium in such an economy. Hence it is of interest to
compare the competitive outcomes with the SSC conventions.

\begin{example}
\label{eg3}{\em Let $N = \{1,2,3\}$, $\alpha_i=1/4$, $v_i(x)=x$, 
$f(12)=f(13)= f(123)=300$ and $f(S) = 0$, otherwise. The technology 
corresponds to the well-known representation of a game with two sellers and 
one buyer. When one does not allow for zero demands, the core of this 
technology is empty. But the set of absorbing states corresponds to $m$ 
repetitions of the form $ (300-\omega,\omega,\omega)$, where $\delta \leq
\omega \leq 300-\delta$. 
Player 1 obtains $300-\omega$ in one of the two smaller coalitions while 2 and
3 
obtain $\omega$ with some probability. It can also be shown that, with the same
bound as in Theorem~ \ref{thm1}, all other states are transient. Of course, it
is being assumed that both the two player coalitions are equally likely. 

As mentioned before, the problem in Example~\ref{eg2} appears because
the set of absorbing states is not a connected set. However, when the set of
absorbing states is a connected set, as it is here, the techniques in the
proof of Theorem~\ref{thm3} can still be applied. A stochastically stable
convention maximizes the function $\hat{r}_{\delta}$ over  $\delta \leq 
\omega \leq 300 - \delta$ where . 
$$
\hat{r}_{\delta} (\mbox{{\bf w }}) = \frac{\delta}{4}\min \left\{ \frac{2}{
300-\omega},\frac{1}{\omega}\right\}.
$$


 The competitive outcome on the other hand, is one in which player 1
yields the least amount to either player 2 or 3. In the present case this
corresponds to $(300- \delta,\delta,\delta)$. 

 When $\delta \rightarrow 0$, the stochastically stable outcomes
converge to $(200,100,100)$ whereas the competitive outcome is the one in
which player 1 gets the whole surplus, i.e., $(300,0,0)$. In expected terms,
players 2 and 3 obtain $50$ in the bargained outcome while the competitive
outcome gives them 0. }
\end{example}

\subsection{Does the Utility function matter?}

As in Young(1993b), let $(v_i,\alpha_i)$ denote the type of a player. An
interesting result of Young(1993b) is emergence of a fifty-fifty split as the
SSC regardless of the types of the players. In the model above, the result
continues to hold in the sense that a particular division will be observed
regardless of the utilities. However, if the technology is not convex, as it
is in the earlier example, this is not the case.

\begin{example}
\label{eg4}{\em In Example~\ref{eg3}, suppose that the utility that player $i
$ derives from consuming $\omega$ is given by $v_i(\omega) = 1-e^{-\omega}$.
Conduct the
entire analysis as before to deduce that the minimum number of mistakes that
are required before some player has a best response different from the
conventional one is given by }

{\em 
$$
\hat{r}_{\delta} (\mbox{{\bf w }}) = \frac{e^{\delta}-1}{4}\min \left\{ 
\frac{2}{e^{300-\omega}-1},\frac{1}{e^\omega-1}\right\}.
$$}

{\em The stochastically stable outcome is the convention that maximizes the
above function. When $\delta \rightarrow 0$, the stochastically stable
outcome converge to $(300-\omega^*,\omega^*,\omega^*)$ where $\omega^*$ solves
$$
2 (e^{\omega^*}-1) = (e^{300-\omega^*}-1).
$$
It may be checked that $\omega^* > 100$. }

{\em The competitive outcome on the other hand, converges to be $(300,0,0)$.
Relative to the competitive outcome, player 1 continues to fare worse in the
stochastically stable outcome. In fact, he does worse in the stochastically
stable outcomes with constant absolute risk-aversion than he did with under
similar outcomes when all the players had linear utilities in Example~\ref
{eg3}. } \end{example}

\begin{thebibliography}{99}
\bibitem{}  BINMORE, K.G: "Nash Bargaining Theory II, and Nash bargaining
theory and incomplete information," in {\em The Economics of Bargaining}
(K.G.Binmore and P. Dasgupta, Eds.), Basil Blackwell, Oxford, 1987

\bibitem{}  CARLSSON, H. (1991): A bargaining model where parties make
errors, {\em Econometrica}, 59, 1487-1496

\bibitem{}  CHATTERJEE, K., B.DUTTA, D.RAY, AND K.SENGUPTA (1993): "A Non-
Cooperative Theory of Coalitional Bargaining," {\em Review of Economic
Studies}, 60, 463-477             

\bibitem{}  FELLER, W. {\em An Introduction to Probablity Theory and Its 
Applications}, Vol 1., 3rd Ed., John Wiley, New York, 1968.

\bibitem{}  FOSTER, D. AND YOUNG, H.P.(1990): "Stochastic Evolutionary game
dynamics," {\em Theoretical Population Biology.}, 38, 219-232.

\bibitem{}  FRIEDLIN, M. AND WENTZELL, A. {\em Random Perturbations of
Dynamical Systems}, Springer Verlag, New York,1984.

\bibitem{}  KANDORI, M., MALAITH, G. AND ROB, R. (1993): "Learning,
mutation and long-run equilibrium in games," {\em Econometrica,} 61, 29-56

\bibitem{}  PERRY, M. and RENY, P. (1991) "A Non-Cooperative View of
Coalition Formation," (Mimeo), University of Western Ontario, London, Canada

\bibitem{} RUBINSTEIN, A. (1982) "Perfect Equilibrium in a Bargaining Model," 
{\em Econometrica}, 50, 97-109. 

\bibitem{}  SERRANO, R. and VOHRA, R. (1993): " Non-Cooperative
Implementation Of The Core," Working Paper No. 93-41, Brown University,
Providence, Rhode Island 02912

\bibitem{}  STAHL, I. (1972) {\em Bargaining Theory.} Stockholm: Economics
Research Institute

\bibitem{}  YOUNG, H.P. (1993a): "The evolution of conventions," {\em 
Econometrica}, 61, 57-84.

\bibitem{}  YOUNG, H.P (1993b): "An evolutionary model of bargaining," 
{\em Journal of Economic Theory,} 59,1, 145-168


\bibitem{}  WINTER, E. (1992): "Non-Cooperative Bargaining in Natural
Monopolies," Discussion Paper, Center for Rationality and Interactive
Decision Theory, Hebrew University, Jerusalem, 91904 Israel

\end{thebibliography}


\appendix
\section{Appendix}

For the sake of clarity, the proof of Proposition 2 will be preceded by a
couple of lemmas. Recall that $\beta(W)$ was the set of all coalitions in 
which the demands were just compatible (at $W$) and $\hat{\beta}(W)$, the set 
of all coalitions in which the demands are strictly compatible. 
$$ 
\beta(W) = \{S \subseteq N:W(S) \leq f(S)\}
$$
$$
\hat{\beta}(W) = \{S \subseteq N: W(S) < f(S)\}.
$$


\begin{lemma}~\label{l1}
Fix $W$. If the demands of $S$ and $T$  are compatible, then 
$$
W(S \cap T) - f(S \cap T)  \leq f(S \cup T)  - W(S \cup T).
$$
Moreover, the above inequality is strict if the demands of either $S$ or $T$
are strictly 
compatible. 
\end{lemma}

\proof Immediate from covexity and the fact the for every $W$, 
$$
W(S \cup T) + W(S \cap T) \equiv W(S) + W(T).
$$ 

\begin{corollary}~\label{cr1}
Suppose that the demands of $S$ and $T$ are compatible but there is no 
coalition whose demands are strictly compatible. Then the demands of $S \cap 
T$ and $S \cup T$ are also compatible. 
\end{corollary} 


\begin{corollary}\label{l2} 
If there are no coalitions whose demands are strictly compatible at $W$, then 
there can be at most one largest coalition whose demands are compatible. 
\end{corollary}

\proof Immediate from the Corollary~\ref{cr1}.

\begin{lemma}~\label{l3}
Fix $W$ and let $W(S) < f(S)$.   Then, there is an $S^* \subseteq S$ such that
\begin{enumerate}
\item  The demands of $S^*$ are strictly compatible.
\item For every compatible coalition $T$ that does not contain $S^*$,
  $T \cup S^*$ is strictly compatible. 
\end{enumerate}
\end{lemma} 

\proof Let  $S_1 \equiv S$. 
If the Lemma holds with $S_1 = S^*$, we are done. Otherwise there is a 
compatible coalition $T$ that does not contain $S_1$ such that $W(S \cup T) 
\geq f(S \cup T)$. Then $S_2 = S_1 \cap T$ is a strict, non-empty subset of 
$S_1$. Furthermore, a straightforward application of Lemma~\ref{l1} reveals
that,  $S_2 = 
W(S_2) < f(S_2)$. If the Lemma holds with $S_2 = S^*$, we are 
done. If not, we can repeat the above process finitely many times to obtain  
an $S_k = \{i\}$ such that $\omega_i < f(i) = 0$. This contradicts the fact
that 
$\omega_i >0$. $\Box$ 

\begin{corollary}\label{c1}
Under the hypotheses of  Lemma~\ref{l3},  for all $i \in S^*$, 
\begin{enumerate}
\item $\hat{\omega}_i > \omega_i$, where \( \hat{\omega}_i = \arg\max
p_i(\omega|W_{-i})v_i(\omega)\).
\item $\hat{\beta}(\hat{W})$ is a strict subset of $\hat{\beta}(W)$, where 
$\hat{W}_{-i} = W_{-i}$ and $\hat{\omega}_i$ is as defined above. 
\end{enumerate}
\end{corollary}

\proof  Let $i \in S^*$.  By 
virtue of Part 2 of Lemma~\ref{l3}, player $i$ can increase his demand by  a  
positive amount say to $\hat{\omega}$ and still ensure that if  a coalition 
$T$ continues to be compatible, then $T \cup S^*$ is also compatible. Hence, 
by Assumption~\ref{as2},  $p(\hat{\omega}|W_{-i}) =1$.  This proves Part 1. 
Part 2 is must hold since player $i$ must increase his demand until one of the
coalitions in which the demands were strictly compatible is just compatible. 
$\Box$


\begin{corollary}~\label{c2}
Let  $W^{*}$ be a Nash equilibrium. Then, $\hat{\beta}(W^*) = \phi$. 
\end{corollary}
\proof Immediate from the definition of a Nash equilibrium and Part 
1 of Corollary~\ref{c1}. $\Box$

\vspace{.25 in} \noindent \proof (Proposition 2) Let $W^*$ be a Nash 
equilibrium. Given Corollary~\ref{c2}, we need only show that  $W^*(N) = 
f(N)$. It is clear that for $W^*$ to be a Nash equilibrium, 
for every player  $i$, there is at least one $S \in \beta(W^*)$ that contains 
$i$. Now use Lemma~\ref{l2} to conclude that $W^*(N) = f(N)$. $\Box$ 

\section{Appendix B} 


In this Appendix, players' demands are in $\Delta_{\delta}$, unless stated 
otherwise. All the results from the Appendix A continue to hold with this 
restriction on players demands because of our assumption the each $f(S)$ 
is a rational number and the particular method of discretization employed. 
  
Given $W, W^{'}$, we will write $W \stackrel{S}{\rightarrow}W^{'}$ if 
$$
\omega^{'}_i= \left\{
                 \begin{array}{ll}
                      \arg\max p_i(\omega|W_{-i})v_i(\omega)& \mbox{if $i\in
S$}\\  
        \omega_i & \mbox{if $i \not\in S$} 
      \end{array}
           \right. 
$$


\begin{lemma}~\label{bl1} 
Suppose $\beta(W^0) \neq \phi$. If there is a set $T$ such that whenever 
$W(S) < f(S)$, $S $ contains a player not in $T$, then there is a sequence 
$$
W^0 \stackrel{i_1}{\rightarrow} W^1 \stackrel{i_2}{\rightarrow} \dots 
\stackrel{i_{L-1}}{\rightarrow} W^{L-1} \stackrel{i_L}{\rightarrow}W^L
$$
such that 
\begin{enumerate} 
\item The demands of no coalition are \underline{strictly} compatible at 
$W^L$ but there is a unique largest coalition whose demands are 
compatible.
\item Furthermore,  the sequence of players above can be  chosen so that $i_l 
\not\in T$ for all $l = 1, 2, \dots L$.
\end{enumerate} 
\end{lemma} 

\proof Assume that $\hat{\beta}(W^0) \neq \phi$ (otherwise there is nothing 
to prove).  We can find an $S^*$ as in Lemma~\ref{l3}.  By hypothesis, $S^*$ 
contains a player not in $T$, say $i_1$. $W^1$ is obtained by $W^0 
\stackrel{i_1}{\rightarrow} W^1$. By Corollary~\ref{c1}, $\hat{beta}(W^1)$ is 
a strict subset of $\hat{\beta}(W^0)$.  Clearly $\beta(W^1) \neq \phi$ because
there must be at least one coalition in which player $i_1$'s demand is 
compatible.  Now repeat the above procedure to obtain $W^1$ to obtain 
$i_2$ and $W^2$ and so on. In finitely many steps, $\hat{\beta} (W^L) = \phi$.
Now use Corollary~\ref{l2} to conclude that there is a unique largest coalition
at $W^L$ in which the demands are compatible. 
$\Box$


\begin{lemma}\label{bl2}
Suppose $\beta(W^0) \neq \phi$. Then, there is a 
sequence $\{W^{l},S_l\}$, $l = 1,2 \dots L_1, L_1 + 1 \dots L_2, L_2+1 \dots, 
L_{Q-1}+ 1, \dots,L_Q$ such that; 
\begin{enumerate} 
\item For all $W^l\stackrel{S_{l+1}}{\rightarrow}W^{l+1}$, for all $0 \leq l 
\leq L_{Q-1}$.
\item $S_{L_k+1} = N$, for all $k = 1,2,\dots Q-1$. 
\item  The demand vector $W^{L_Q}$ is in the core. 
\end{enumerate}
\end{lemma} 

\proof Set $T = \phi$ in Lemma~\ref{bl1} and obtain the sequence up to 
$W^{L_1}$. Let $T_1$ be the unique largest coalition in which the demands are 
compatible.  If $T_1 = N$, we are done. Otherwise,
consider $W^{L_1} \stackrel{N}{\rightarrow} W^{L_1+1}$. Since the 
demands of no coalition are strictly compatible at $W^{L_1}$, no player will 
increase his demand. Moreover, by Assumption~\ref{as2}, 
$p_i(\omega^{L_1}_i|W^{L_1}_{-i}) = 1$ for all  $i \in  T_1$. Hence, 
$\omega^{L_1+1}_i =\omega^{L_1}_i$, 
for all $i \in T_1$. Furthermore, a player  not in  $T_1$ 
will reduce his demand so that at $W^{L_1+1}$, 
every player will have his demand met in at least one  coalition.  If now 
$\hat{\beta}(W^{L_1+1}) = \phi$, we are done as $W^{L_1 + 1}$ is in the core. 

Suppose $\hat{\beta}(W^{L_1 +1}) \neq \phi$. 
Set $T_1\equiv T$, $W^0 \equiv W^{L_1+1}$ in Lemma~\ref{bl1} 
to obtain the sequence leading to $W^{L_2}$. Again we have a unique largest 
coalition $T_2$ whose demands are compatible but the demands of no coalition 
are strictly compatible. Since the demands of the players in $T_1$ have not 
changed, and at least one more player's demand is now being met, it follows 
that $T_2$ is a strict superset of $T_1$.  If $W^{L_2}$ is in the core, we 
are done. Otherwise we need repeat the above process only finitely many times 
to reach a situation where $T_Q =N$. $\Box$ \\ 


\proof (Theorem 1) Again, assume, for the sake of exposition, that $\alpha_i 
= \alpha_j$. Let {\bf s} be the state at time $t$ and let $\sigma$ 
denote the sample consisting of the $k$ most recent records. Let $W^0$ be the 
demand vector resulting from every player sampling $\sigma$.  Assume that 
$\beta(W^0)$ is not empty. The other case will be dealt with shortly. 
Now, let $\{W^l,S_l\}$ be the sequence as in Lemma~\ref{bl2}. 

Let {\bf  A} be the following event:
\begin{enumerate}
\item Between $t$ and $t+k$, all players sample $\sigma$. 
\item Between $t+lk+1$ and $t+(l+1)k$, all players except those in $S_l$ 
continue to sample what they were sampling between $t+(l-1)k+1$ and $t+lk$. 
Players in  $S_l$ sample the $k$ most recent demands, $1 \leq l \leq L_{Q-1}$.
\end{enumerate} 


It is easy to verify that if the {\bf A}  were to occur,  it will lead to $k$ 
repetitions of $W^0$ between $t+1$ and $t+k$, followed successively by $k$ 
repetitions of $W^1$ between $t+k+1$ and $t+2k$ and so on finally ending with 
$k$ repetitions of $W^{L_Q}$ between $t+{L_{Q-1}}k +1$ and $t+L_Qk$. Indeed,
given our 
assumptions that each player samples every $k$ sample with a positive 
probability, the event {\bf A} would occur with a positive probability if the 
initial sample $\sigma$ were available in its entirety between until date 
$t+k(L_1+1)$ and since $S_{L_q} = N$, for the next $k(L_q+1)$ periods $W^{L_q}$
were available.  This would be possible if $m \geq \max_q\{(L_q+2)k\}$. 
Note that $L_q$'s depend only on the number of coalitions in which the demands
were strictly compatible. Hence, $L_q \leq \frac{n(n-1)}{2}$, which is the 
number of two player coalitions with $n$ players. 

Now, assuming that the bound on $k/m$ is met, there is then  a positive 
probability of observing $k$ repetitions of $W^{L_Q}$, which is a core 
allocation, in finitely many stages. Furthermore, now there is a positive 
probability that all the players sample this $k$-run of $W^{L_Q}$ to achieve 
another $k$-run of $W^{L_Q}$ and so on to finally obtain the convention {\bf 
w$^{L_Q}$} in some $M$ periods altogether. 

Thus, there is a positive probability $p$, bounded away from zero by our 
assumption of stationarity, and an integer $M$, such that a convention is 
established in $M$ periods. Hence the probability of not reaching a convention
in $rM$ periods is $(1-p)^r$, which goes to zero as $r \rightarrow \infty$. 

 Now suppose $\hat{\beta}(W^0) = \phi$. If $W^0$ is in the core, we are 
done. If the demands of no coalition are compatible at $W^0$, then consider
$W^0 \stackrel{N}{\rightarrow} \hat{W}^0$ and conduct the above analysis 
starting from somewhere in the middle of the sequence obtained in 
Corollary~\ref{bl1}. $\Box$

\begin{lemma}~\label{nooftrembles} Fix a convention  {\bf w}$^{*}$. Starting 
from this convention, the minimum number of mistakes required for player $i_0$
to be the \underline{first player} to have a best response 
different from $\omega^*_i$ for the \underline{first time} is given by 

 $$ 
\min\left\{\frac{k_{i_0}v_i(\omega_{i_0}^{*})}{v_{i_0}(\omega_{i_0}^{*}+(n-
1)\delta )},k_{i_0}[1-\frac{ v_{i_0}(\omega_{i_0}^{*}-\delta 
)}{v_{i_0}(\omega_{i_0}^{*})}]\right\} $$ \end{lemma} 

\noindent \proof Starting from a convention {\bf w}$^*$, let {\bf s} be the
first
state in which player $i_0$  is the first player to have a best-response
different from the conventional demand. For concreteness, set $i_0=1$. Note 
that  by the definition of {\bf s }, every $W_{-1}$  in {\bf s } that is not 
$W^*_{-1}$ involves at least one mistake.  Let there be $L$ such demands that 
differ from $W^*_{-1}$. In fact, without loss of generality assume that these

are be the $L$ most recent demands. Let $\hat{\omega}$ denote the best 
response of player 1. 
 
{\em \underline{Case 1:}} ($\hat{\omega} < \omega^*_i$) Now consider a sample
of size 
$k_1$ from {\bf s } that includes the $L$ demands.  Thus $(k_1 -L)$  of the 
demands in this sample are $W^*_{-1}$ while the last $L$ demands are $W^l_{-
1}$, $l = 1,2, \dots L$. By playing $\omega^*_1$ player $1$'s expected utility
is 
\begin{equation} \label{e1} 
(1-\frac{L}{k_1})v_1(\omega^*_1) + \sum_{L =
k_1-L+1}^{k_1}p_1(\omega^*_1|W_{-1}^l)v_1(\omega^*), 
\end{equation}
which is no larger than \begin{equation}
 \label{e2}(1-\frac{L}{k_1})v_1(\hat{\omega}) + \sum_{l =
k_1-L+1}^{k_1}p_1(\hat{\omega} 
|W_{-1}^l)v_1(\hat{\omega}), \end{equation} the payoff if he plays
$\hat{\omega}$. 

Define 
\begin{equation}~\label{eqS1}
S_1  = \bigcap_{S \in \beta(W^*), 1 \in S} S.
\end{equation}

Since there are no strictly compatible coalitions at $W^*$, 
Corollary~\ref{cr1} applies and, $W^*(S_1) = f(S_1)$.
Without loss of generality, assume that $2 \in S_1$. Consider a new demand 
vector $\overline{W}$ that differs from $W^*$ only in the demand of player 2. 
Player 2 demands $\omega^*_2+(\omega^*_1-\hat{\omega})$. 

Now consider the new sample constructed from the original sample with the last
L entries as  $\overline{W}$. Note well that this new 
sample has exactly $L$ mistakes, no more than those in the original sample. 
These are of course the $L$ demands of player 2 that differ from $w^*_2$. 
I will now show that  given $\hat{\omega}$ is a best-response to the original 
sample, $\hat{\omega}$ is a best-response to this new sample. 

Note that by our choice of player 2,  player 1's demand $\omega^*_1$ is not 
compatible in any coalition when 2 demands $\omega^*_2
+(\omega^*_1-\hat{\omega})$.  
The payoff of player 1, then on playing $\omega^*_1$ is by 
$$
(1-\frac{L}{k_1})v_1(\omega^*_1). 
$$

However on demanding $\hat{\omega}$, he obtains it for sure and hence achieves
a payoff of $v_i(\hat{\omega})$. Since the expression in Eq. ~\ref{e2} is at 
least as large as that in Eq. ~\ref{e1}, 
it is easy to see that 
$$
v_1(\hat{\omega})  \geq  (1-\frac{L}{k_1})v_1(\omega^*_1). 
$$
Hence, if $\hat{\omega}$ was a best response to the initial sample, it 
continues to be a best response to the new sample. Thus, 
\begin{eqnarray}
   L     \geq  k_1[1-\frac{v_1(\hat{\omega})}{v_1(\omega^*_1)}] \nonumber \\
        &  \geq  & m r_1(\omega^*_1, \delta).
\end{eqnarray}

\noindent {\em \underline{Case 2:}} ($\hat{\omega} > \omega^*_1$) As before
consider a
sample of size $k_1$ from the state {\bf s } for which $\hat{\omega}$ is a
best
response. Now as in the earlier case the payoff on demanding $\omega^*_1$ is
given by Eq. ~\ref{e1}. However on demanding $\hat{\omega}$, the payoff is 
\begin{equation}
\label{e3} \sum_{l=n+1}^{k_1}p_1(\hat{\omega}|W_{-1}^l)v_1(\hat{\omega}). 
\end{equation}
Since $\hat{\omega}$ is a best response, Eq. (~\ref{e3}) is at least as large
as 
Eq.(~\ref{e1}). 

As in the earlier case, construct a new sample from the initial sample 
by replacing the last $L$ demands of every player except 2 with $\omega^*_j$. 
Replace those of player 2 with $\overline{\omega}_2 = \omega^*_2 -
(\hat{\omega} - \omega^*_1)$. 
In this new sample, there are exactly $L$ mistakes, no larger than those in 
the initial sample. 

Now, it remains to show that $\hat{\omega}$ continues to be a best response for
player $1$ in this new sample. That is, we need to show that 
 $$ \frac{L}{k_1} v_1(\hat{\omega}) \geq v_1(\omega^*_1). $$ 
The above follows from the following sequence of inequalities.
\begin{eqnarray}
\frac{L}{k_1}v_1(\hat{\omega}) - v_1(\omega^*_1) & = &
\frac{L}{k_1}[v_1(\hat{\omega})-v_1(\omega^*)] -
(1-\frac{L}{k_1})v_1(\omega^*_1)  \nonumber \\
                                    & \geq & \sum_{l = k_1-L+1}^{k_1}
p_1(\hat{\omega}|W^l_{-1})[v_1(\hat{\omega}) -v_1(\omega^*_1)]  -
(1-\frac{L}{k_1})v_1(\omega^*_1) 
\nonumber \\                    
                                  & \geq   &\sum_{l = k_1-
L+1}^{k_1}p_1(\hat{\omega}|W^l_{-1})v_1(\hat{\omega})- \nonumber \\
\label{uas1}                                  &  &  \;\; \sum_{l=k_1-
L+1}^{k_1}p_1(\omega^*_1|W^l_{-1})v_1(\omega^*_1) -
(1-\frac{L}{k_1})v_1(\omega^*_1) \\  
& \geq & 0 \nonumber.  \end{eqnarray} 


Eq. (~\ref{uas1}) follows from Assumption~\ref{as1} and the non-negativity 
follows from the fact that the expression in Eq. (~\ref{e3}) is at least as 
large as that in Eq. (~\ref{e1}). 
 Hence, the number of mistakes is given by
$$ 
L \geq k_1v_1(\omega^*_1)/v_1(\hat{\omega}).
$$

The minimum number occurs when $\hat{\omega}$ takes the maximum value. But
since we 
allow only for small mistakes, it follows that the largest value of
$\hat{\omega}$ 
cannot be  more than $\omega^*_1 + (n-1)\delta$.  So, the number of mistakes
cannot 
fall below         
$$ 
k_1\frac{v_1(\omega^*)}{v_1(\omega^*+(n-1)\delta)} .
$$

The proof is complete on taking the minimum of the two lower bounds obtained 
in the two different cases above. $\Box$ 

\begin{lemma}\label{min}
Let $0 < \delta < \alpha_1f(N)/(\alpha_nn^2)$. If  {\bf w} is a  
convention, then 
$$ 
\alpha_i\frac{v_i(\omega_i)}{v_i(\omega_i+(n-1)\delta)} \ \geq \ r(W,\delta).
$$ 
\end{lemma} 

 \proof Using concavity of $v_i$ and the fact that $\omega_i \geq \delta$, we 
have 
\begin{eqnarray*} 
\alpha_i\frac{v_i(\omega_i)}{v_i(\omega_i+(n-1)\delta)} & \geq &
\alpha_i\frac{\omega_i}{\omega_i+(n-
1)\delta} \\ & \geq &\frac{\alpha_1}{n} 
\end{eqnarray*} 

Again by concavity of $v_i$ and the fact that there is at least one $\omega_i 
\geq f(N)/n$, we have 
\begin{eqnarray*}
r(W,\delta)  & \leq &  r_i(\omega_i,\delta) \\ 
              & \leq & \frac{\alpha_i\delta}{\omega_i} \\
               & \leq & \frac{\delta n\alpha_n}{f(N)}
\end{eqnarray*} 

The rest of the proof is immediate upon comparing the above two inequalities. 
$\Box$

\begin{corollary}~\label{b1}
Let  $0 < \delta \leq \alpha_1f(N)/(\alpha_nn^2)$. At a convention {\bf w} 
and for any state {\bf s}, 
$$
\mbox{{\bf r(w,s)}} \geq [mr(W,\delta)]
$$ 
 \end{corollary} 

\proof  By virtue of the previous two lemmas, $[mr_i(\omega_i,\delta)]$ is the
minimum number of mistakes for player $i$ to be the first player to have a 
best response different from $\omega_i$  in the convention {\bf w}. Hence, 
{\bf r(w,s)} cannot fall below $[mr(W,\delta)]$, the minimum number of 
mistakes required before {\em some} player has a different best response for 
the first time.  $\Box$


Now, let $e_{ij}$ denote the vector in $R^n_+$ where the $i$th component is $
-\delta$, the $j$th component is $\delta$ and every other component is zero.
Given a convention {\bf w},   the convention {\bf w}$+e_{ij}$ (if it is a 
convention) involves a transfer of $\delta$ from player $i$ to $j$. 


\begin{lemma}~\label{resistance} Let  $\delta \leq \alpha_1f(N)/(\alpha_n 
n^2)$ and  $W, \  W+e_{ij} \in  {\cal C}_{\delta}(f)$. Suppose that
$r(W,\delta) 
= r_{i_0}(\omega_{i_0},\delta).$ Then 
\begin{enumerate} 
\item $\mbox{{\bf r(w,w$+e_{ij}$)}} = [mr(W,\delta)] = 
[mr_{i_0}(\omega_{i_0},\delta)]$, if $j \neq i_0$ and 
\item $\mbox{{\bf 
r(w,w$+e_{ij}$)}} \leq  \min_{i \neq i_0}[r_i(\omega_i,\delta)]$, if $j = i_0$.
\end{enumerate} 
\end{lemma} 

\proof  I will prove only Part 1 of the Lemma. Part 2 can be proved by 
constructing arguments almost identical to those of provided for Part 1 below.

For concreteness, take  $i_0=1$ and pick  $2 \in S_1$ where $S_1$ is obtained 
as in Eq.~\ref{eqS1}. Now consider the following three demand vectors: 

\begin{enumerate}
\item  $W^{1}$ differs from $W^*$ only in player 1's demand: $\omega^{1}_1 =
\omega^*_1 - \delta$.
\item  $W^{j}$ differs from $W^*$ only player j's demand: $\omega^{j}_j =
\omega^*_j+\delta$.
\end{enumerate}

By Corollary~\ref{b1}, it suffices to exhibit a  path between {\bf w} 
and {\bf w$+e_{ij}$} that involves exactly $[mr(W,\delta)$ mistakes. 

Suppose the EBP is in state {\bf w } at time t. Now, suppose that between
time $t+1$ and $t^*=t+[mr(W,\delta)]$ player
2 plays $\omega^*_2+\delta$ instead of $\omega^*_2$. Each of these demands is a
mistake. 
Let $\sigma$ denote the sample consisting of the $k = \max_{i\in N}k_i$  most 
recent demands. By our choice of player 1, $\sigma$ contains a subsample for 
which the best response of every player other than 1 continues to be
$\omega^*_i$. 
For player 1, given our choice of player 2, the best response is $\omega^*_1-
\delta$. 

Now there is a positive probability between  $t^*+1$ and $t^*+k$ 
all the players will sample from $\sigma$ leading to a run of $k$ repetitions 
of $W^{1}$. It is useful to remind the reader, that the entire sample $\sigma$
will be available until date $t^*+3k$ by our assumption that $k \leq m/4$. 

Following this run of $W^{1}$, there is a positive probability that between $
t^*+ k +1$ and $t^*+2k$ all the players except $1$ and $j$ will continue to
sample $\sigma$ while $1$ and $j$  sample from the $k$ most recent
records consisting of $W^{1}$. It is straightforward to check that this
should lead to a run of $k$ demands of $W^{j}$.

Between $t^*+2k+1$ and $t^*+3k$, there is a positive probability that all the
players except players $1, i$ and $j$ will sample $\sigma$; players $1$ and $j$
continue to sample $W^{1}$ as before and player $i$ samples the $k$
most recent records consisting of $W^{j}$. It may be checked that this
will lead to a run of $W+e_{ij}$.

Finally, from $t^*+3k$ onwards, there is a positive probability that all the
players will sample the $k$ most records consisting of $W+e_{ij}$
thereby establishing the convention {\bf w$+e_{ij}$} 
in due course. Since the above
path involved a best response by the players at every stage following the
first $t^*-t$ mistakes by player $2$, the proof is complete. $\Box$

\proof (Theorem~\ref{thm2})\footnote{In this proof, I use, without providing
formal 
definitions and proofs certain terminology and assertions that are rather 
standard in the theory of finite Markov Processes. See for e.g. Feller (1957)}
Consider an aperiodic Markov Chain with state space $\Omega$. A state can 
either be persistent or transient. Now, we know from that the persistent 
states of a finite Markov chain can be uniquely decomposed into disjoint 
closed sets, $C_1, C_2, \dots C_K$ such that  {\em from any state of a given 
set $C_k$,  
all states of this set and no other can be reached}.  In particular, if we 
show that the decomposition must involve a unique closed set,  say $\Omega^*$,

it then follows that all the states $\Omega \setminus \Omega^*$ are transient 
and that the states corresponding to  $\Omega^*$ form an irreducible sub-
chain. This  sub-chain admits a unique stationary distribution. Since a 
transient state cannot be in the support of a stationary distribution of the 
original Markov Chain, it then follows that there is a unique stationary 
distribution of the original chain; this is of course the invariant 
distribution corresponding to the the states in $\Omega^*$ extended with zeros
corresponding to the states in $\Omega \setminus \Omega^*$. 


That the perturbed EBP is aperiodic is obvious; from an arbitrary state, a 
convention can be reached following which a return to the original state if 
possible, can be achieved in either an even number or an odd number of 
periods.  I now claim that there is a unique closed set of recurrent states, 
say  $\Omega^*$, for the perturbed EBP and this contains all conventions.  
Indeed, let $C_1, C_2, \dots C_K$ the decomposition. Let {\bf s} be a state in
$C_k$. By Theorem 1, there is a positive probability of reaching a convention 
in finitely many steps. Hence, $C_k$ must contain at least one convention, say
{\bf w}.  Moreover, by virtue of Lemma~\ref{resistance}, from a given 
convention {\bf w}, there is a positive probability of reaching every 
convention of the form {\bf w}$+e_{ij}$. Since the core is a convex set, it 
follows that any convention {\bf w$^{ '}$} can be reached from an arbitrary 
convention {\bf w}.   Hence $C_k$ must in fact contain all the conventions. 
Since $C_k$ was arbitrarily chosen, it follows that the decomposition must 
involve a unique closed set of persistent states.  The proof is now complete 
by the arguments of the preceding paragraph. 

                                                                             
\begin{lemma}~\label{minimizer}
Suppose that $W^{\delta}$ maximizes $r(.,\delta)$ over ${\cal C}_{\delta}(f)$.
Let $W \in {\cal C}_{\delta}(f)$ be such that $\omega_i < \omega^{\delta}_i$.
Then 
$r(W,\delta) < r_i(\omega_i,\delta)$. 
\end{lemma} 

\proof Suppose, by way of contradiction, that $r(W,\delta) = 
r_i(\omega_i,\delta)$. Since  $r_i(.,\delta)$ is a strictly decreasing 
function, $r(W,\delta) > r_i(\omega^{\delta}_i,\delta) \geq 
r(W^{\delta},\delta)$, thereby contradicting the fact that $W^{\delta}$ 
maximizes $r(.,\delta)$. 


\proof (Theorem 3) 
Let $W^{\delta}$ be a maximum of $r(.,\delta)$.  Now construct a {\bf 
w$_{\delta}$}-tree by placing a directed edge from each {\bf w} ($\neq$ {\bf 
w$_{\delta}$}) to a convention {\bf w}$+e_{ij}$, where $\omega_i > 
\omega^{\delta}_i$ and $\omega_j < \omega^{\delta}_j$.  Intuitively, this 
procedure involves transferring $\delta$ at each stage from player $i$ who is 
demanding "too much" to a player $j$ who is demanding "too little".  Since the
core is a convex set, this "equalizing" principle can be followed to obtain a 
{\bf w$^{\delta}$}-tree.  Let $T^*$ denote this tree. 

At an edge {\bf w}$\rightarrow$ {\bf w}$+e_{ij}$, $\omega_j < 
\omega^{\delta}_j$. Hence by Lemma~\ref{minimizer}, $r(W,\delta) < 
r_j(W,\delta)$. Hence, by Part 1 of Lemma~\ref{resistance},  the resistance of
each edge in the tree $T^*$ is $$ \mbox{{\bf r(w,w$+ e_{ij}$)}} = r(W,\delta).
$$ So, the resistance of the tree  $T^*$ is 
\begin{equation}~\label{rofT} 
 \sum_{W \neq W^*} [mr(W,\delta)]. 
\end{equation}
For any other convention, say $\hat{\mbox{{\bf w}}}$, the resistance of a  
$\hat{\mbox{{\bf w}}}$-tree, say $T$ is 
\begin{eqnarray*} 
 \sum_{\mbox{{\bf (w$^1$,w$^2$)}}\in T} \mbox{{\bf r(w$^1$,w$^2$)}} & \geq &  
\sum_{ W \neq \hat{W }} [mr(W,\delta)] \\ 
 & \geq & \sum_{ W \neq W^{\delta}} [mr(W,\delta)].
\end{eqnarray*} 
The first inequality follows from Corollary~\ref{b1} and the second 
inequality is strict if $\hat{W}$ does not maximize $r(.,\delta)$.  

Hence, {\bf w$^{\delta}$} has the least stochastic potential with the
resistance 
given Eq. (~\ref{rofT}). By Theorem A, it is then a Stochastically stable 
state. 

If there are several maxima of $r(.\delta)$, each of them is a SSC. 


\proof (Theorem 4) Let $W^* \in C(f)$ be the maximum of $g$ and let 
$K$ be as in the statement of the Theorem. 
For a player $i$, let $S_i$ denote the smallest coalition $S$ that contains 
$i$ and  
$W^*(S) = f(S)$. Such an $S_i$ is well defined by Corollary~\ref{cr1}. 

For concreteness, suppose that $1 \in K$. 
Since, $g_1(\omega^*_1)  \leq g_j(\omega^*_j)$ for all $j \in 
N$, it is easy to show\footnote{Using continuity and the fact that $g_i$'s  
are strictly decreasing} that  for every $\epsilon_1 > 0$, there exists an
$\epsilon_2$ in the open interval $ (0,\epsilon_1)$ such that 
\begin{equation}~\label{minimum}
g_1(\omega_1^* + \epsilon_1)  <  g_j(\omega_j^* + \epsilon_2).
\end{equation} 


Now, suppose that there is a player $j \neq 1$ such that $1 \in S_j$.  
 (The case when no such $j$ exists will be analyzed shortly.)  At $W^*$ then, 
we can reduce the demand of $j$ by an arbitrarily small 
$\epsilon > 0$ and correspondingly increase the demand of $1$ and yet remain 
in the core. Hence for all $0 < \epsilon_2 < \epsilon_1 < 
\epsilon/(n+1)$, there exists a $W \in {\cal C}(f)$ such that 
$\omega_1^*+\epsilon_1 < \omega_1 < 
\omega_1^*+\epsilon/(n+1)$ and $\omega_j < \omega^*_j + \epsilon_2$, for all $j
\neq 1$.  
Note that with our method of discretization, $ {\cal C}_{\delta}(f) \subset 
{\cal C}_{\delta^{'}}(f)$ when $\delta < \delta^{'}$ and $\cup_{\delta} {\cal 
C}_{\delta}(f) = {\cal C}(f)$. Hence, we can find  a $\delta^* > 0$ and a $W 
\in {\cal C}_{\delta^*}(f)$ so that for all $\delta < \delta^*$, 
$\omega_1^*+ \epsilon_1 \leq \omega_1 - \delta < \omega_1 
\leq \omega^*_1 + \epsilon/(n+1)$  and  for $j (\neq 1) \in N$, $\omega_j \leq
\omega^*_j + \epsilon_2$. In fact, pick $\epsilon_1$ and $\epsilon_2$ such 
that Eq.~\ref{minimum} is satisfied. Then, 
\begin{eqnarray} 
r_1(\omega_1,\delta) & \leq & g_1(\omega_1^*+\epsilon_1) \nonumber \\ 
                     & < & g_j(\omega^*_j+\epsilon_2)  \nonumber\\ 
                     & \leq & r_j(\omega_j,\delta),       
\end{eqnarray}
for all $j \in N$. Hence,  $r(W,\delta) = r_1(\omega_1,\delta)$.  Now apply 
Lemma~\ref{minimizer} to conclude that if 
$W^{\delta}$ maximizes $r(.,\delta)$ over ${\cal C}_{\delta}(f)$, then 
\begin{equation}~\label{upperinq} 
\omega^{\delta}_1  \leq  \omega^*_1 + \frac{\epsilon}{n+1}.
\end{equation}

Now if $1 \not\in S_1$ for every $j \neq 1$, then $ S_j \subseteq 
N \backslash 1$.  Apply Corollary~\ref{cr1} to conclude 
that $ W^*(N \setminus 1) =f(N \setminus 1) $ 
and hence $\omega^*_1 = f(N) - f(N \setminus 1)$, the largest payoff that 
player $1 $ can obtain in the core. Hence, Eq.~\ref{upperinq} follows 
immediately.


To obtain a lower bound on $\omega^{\delta}_1$, first note that there in no 
loss in generality in assuming that Eq.~\ref{upperinq} holds for all players 
in $K$ for the same $\epsilon$. Second, I claim that $W^*(K) = f(K)$. 
To see this, consider a player $j \in S_1$. If  $g_1(\omega_1^*) < 
g_j(\omega_j^*)$, we can increase the demand of player $j$ by a small amount 
while decreasing the demand of player $1$ while remaining in the core. 
By\footnote{A formal argument involves a standard application of Taylor's 
theorem} doing this, we increase the value 
of $g$ and this contradicts the fact that $g$ is maximized at $W^*$.  
Hence, $g_j(\omega_j^*) \leq g_1(\omega^*_1)$. But this implies $S_1 \subseteq
K$. Hence $\cup_{i \in K} S_i = K$ and using Corollary~\ref{cr1}, $W^*(K) = 
f(K)$.  

It is now easy to verify that if $\omega^{\delta}_1 < \omega^*_1 - 
\frac{n}{n+1} \epsilon$,  then there must be a player $j \in K$ such 
that $\omega_i^{\delta} >  \omega^*_j + \frac{\epsilon}{n+1}$, thereby
violating 
Eq.~\ref{upperinq}. 

Hence, for all $i \in K$, and for all $0 < \delta \leq \delta^*$,
$$ \mid \omega^{\delta}_i - \omega^*_i \mid <  \epsilon. 
$$ 


\end{document}