%% This document created by Scientific Word (R) Version 2.5


\documentclass[12pt,thmsa,a4paper]{article}
\usepackage{amsfonts}

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\usepackage{sw20jart}

%TCIDATA{TCIstyle=article/art4.lat,jart,sw20jart}

%TCIDATA{Created=Sat Dec 13 21:56:00 1997}
%TCIDATA{LastRevised=Sun Dec 26 11:11:24 1999}
%TCIDATA{Language=American English}

\input{tcilatex}
\begin{document}

\title{A General Class of Adaptive Strategies\thanks{%
Previous version: March 1999. We thank the referees and editors for their
useful comments. Research partially supported by grants of: the Israel
Academy of Sciences and Humanities, the Spanish Ministry of Education, the
Generalitat de Catalunya, CREI, and the EU-TMR Research Network.}}
\author{Sergiu Hart\thanks{%
Center for Rationality and Interactive Decision Theory; Department of
Mathematics; and Department of Economics; The Hebrew University of
Jerusalem, 91904 Jerusalem, Israel. $\quad $ \emph{E-mail}:
hart@math.huji.ac.il $\quad $ \emph{URL}: http://www.ma.huji.ac.il/\symbol{%
126}hart} \and Andreu Mas-Colell\thanks{%
Department of Economics and Business; and CREI; Universitat Pompeu Fabra,
Ramon Trias Fargas, 25-27, 08005 Barcelona, Spain. \emph{E-mail:}
mcolell@upf.es}}
\maketitle

\begin{abstract}
We exhibit and characterize an entire class of simple adaptive strategies,
in the repeated play of a game, having the Hannan-consistency property: In
the long-run, the player is guaranteed an average payoff as large as the
best-reply payoff to the empirical distribution of play of the other
players; i.e., there is no ``regret.'' Smooth fictitious play (Fudenberg and
Levine [1995]) and regret-matching (Hart and Mas-Colell [2000]) are
particular cases. The motivation and application of the current paper come
from the study of procedures whose empirical distribution of play is, in the
long-run, (almost) a correlated equilibrium. For the analysis we first
develop a generalization of Blackwell's [1956a] approachability strategy for
games with vector payoffs.

\emph{Keywords:} adaptive strategies, approachability, correlated
equilibrium, fictitious play, regret.

\emph{Journal of Economic Literature Classification: }C7, D7, C6.
\end{abstract}

%TCIMACRO{\TeXButton{setlength}{\setlength{\baselineskip}{.25in}}}
%BeginExpansion
\setlength{\baselineskip}{.25in}%
%EndExpansion

%TCIMACRO{
%\TeXButton{References Without Numbers}{
%\def\@biblabel#1{#1\hfill}
%\def\thebibliography#1{\section*{References}\list
%{}{
%   \labelwidth 0pt
%   \leftmargin 1.8em
%   \itemindent -1.8em
%   \usecounter{enumi}}
%\def\newblock{\hskip .11em plus .33em minus .07em}
%\sloppy\clubpenalty4000\widowpenalty4000
%\sfcode`\.=1000\relax\def\baselinestretch{1}\large \normalsize}
%\let\endthebibliography=\endlist}}
%BeginExpansion

\def\@biblabel#1{#1\hfill}
\def\thebibliography#1{\section*{References}\list
{}{
   \labelwidth 0pt
   \leftmargin 1.8em
   \itemindent -1.8em
   \usecounter{enumi}}
\def\newblock{\hskip .11em plus .33em minus .07em}
\sloppy\clubpenalty4000\widowpenalty4000
\sfcode`\.=1000\relax\def\baselinestretch{1}\large \normalsize}
\let\endthebibliography=\endlist%
%EndExpansion

\section{Introduction\label{s-introduction}}

Consider a game repeated through time. We are interested in strategies of
play which, while simple to implement, generate desirable outcomes. Such
strategies, typically consisting of moves in ``improving'' directions, are
usually referred to as adaptive.

In Hart and Mas-Colell [2000] we presented simple adaptive strategies with
the property that, if used by all players, the empirical distribution of
play is, in the long-run, (almost) a correlated equilibrium of the game (for
other procedures leading to correlated equilibria, see\footnote{%
For these and the other topics discussed in this paper, the reader is
referred also to the book of Fudenberg and Levine [1998] and the survey of
Foster and Vohra [1999] (as well as to the other papers in the special issue
of \emph{Games and Economic Behavior} 29 [1999]).} Foster and Vohra [1997]
and Fudenberg and Levine [1999]). From this work we are led---for reasons we
will comment upon shortly---to the study of a concept originally introduced
by Hannan [1957]. A strategy of a player is called \emph{Hannan-consistent}
if it guarantees that his long-run average payoff is as large as the highest
payoff that can be obtained (i.e., the one-shot best-reply payoff) against
the empirical distribution of play of the other players. In other words, a
strategy is Hannan-consistent if, given the play of the others, there is no
regret in the long-run for not having played (constantly) any particular
action. As a matter of terminology, the \emph{regret} of player $i$ for an
action\footnote{%
Think of this as the ``regret for not having played $k$ in the past.''} $k$
at period $t$ is the difference in his average payoff up to $t$ that results
from replacing his actual past play by the constant play of action $k.$
Hannan-consistency thus means that all regrets are non-positive, as $t$ goes
to infinity.

In this paper we concentrate on the notion of Hannan-consistency, rather
than on its stronger conditional version which characterizes convergence to
the set of correlated equilibria (see Hart and Mas-Colell [2000]). This is
just to focus on essentials. The extension to the conditional setup is
straightforward; see Section 5 below.

Hannan-consistent strategies have been obtained by several authors: Hannan
[1957], Blackwell [1956b] (see also Luce and Raiffa [1957, pp. 482-483]),
Foster and Vohra [1993, 1998], Fudenberg and Levine [1995], Freund and
Schapire [1999], and Hart and Mas-Colell [2000, Section 4(c)].\footnote{%
See also Ba\~{n}os [1968] and Megiddo [1980] and, in the computer science
literature, Littlestone and Warmuth [1994], Auer \emph{et al.} [1995] and
the book of Borodin and El-Yaniv [1998].} The strategy of Fudenberg and
Levine [1995] (as well as those of Hannan [1957], Foster and Vohra [1993,
1998] and Freund and Schapire [1999]) is a smoothed out version of
fictitious play. (We note that fictitious play---which may be stated as: at
each period play an action with maximal regret---is by itself not
Hannan-consistent.) In contrast, the strategy of Hart and Mas-Colell [2000],
called ``regret-matching,'' prescribes, at each period, play probabilities
that are proportional to the (positive) regrets. That is, if we write $%
D\left( k\right) $ for the regret of $i$ for action $k$ at time $t$ (as
defined above) and $D_{+}\left( k\right) $ for the \emph{positive regret}
(i.e., $D\left( k\right) $ when $D\left( k\right) >0$ and $0$ when $D\left(
k\right) \leq 0$), then the probability of playing action $k$ at period $t+1$
is simply $D_{+}\left( k\right) /\sum_{k^{\prime }}D_{+}\left( k^{\prime
}\right) $.

Clearly, a general examination is called for. Smooth fictitious play and
regret-matching should be but particular instances of a whole class of
adaptive strategies with the Hannan-consistency property. In this paper we
exhibit and characterize such a class. It turns out to contain, in
particular, a large variety of new simple adaptive strategies.

In Hart and Mas-Colell [2000] we have introduced Blackwell's [1956a]
approachability theory for games with vector payoffs as the appropriate
basic tool for the analysis: The vector payoff is simply the vector of
regrets. In this paper, therefore, we proceed in two steps. First, in
Section 2, we generalize Blackwell's result: Given an approachable set (in
vector payoff space), we find the class of (``directional'') strategies that
guarantee that the set is approached. We defer the specifics to that
section. Suffice it to say that Blackwell's strategy emerges as the
particular quadratic case of a continuum of strategies where continuity and,
interestingly, integrability feature decisively.

Second, in Section 3, we apply the general theory to the regret framework
and derive an entire class of Hannan-consistent strategies. A feature common
to them all is that, in the spirit of bounded rationality, they aim at
``better'' rather than ``best'' play. We elaborate on this aspect, and carry
out an explicit discussion of fictitious play in Section 4. Section 5
discusses a number of extensions, including conditional regrets and
correlated equilibria.

\section{The Approachability Problem\label{s-approachability}}

\subsection{Model and Main Theorem\label{ss-appr-main}}

In this section we will consider games where a player's payoffs are \emph{%
vectors} (rather than, as in standard games, scalar real numbers), as
introduced by Blackwell [1956a]. This setting may appear unnatural at first.
However, it has turned out to be quite useful: The coordinates may represent
different commodities; or contingent payoffs in different states of the
world (when there is incomplete information); or, as we will see below (in
Section \ref{s-regret}), regrets in a standard game.

Formally, we are given a game in strategic form played by a player $i$
against an opponent $-i$ (which may be Nature and/or the other players). The 
\emph{action} sets are the finite\footnote{%
See however Remark \ref{rem-infinite-S} below. Also, we always assume that $%
S^{i}$ contains at least two elements.} sets $S^{i}$ for player $i$ and $%
S^{-i}$ for $-i.$ The payoffs are \emph{vectors} in some Euclidean space. We
denote the payoff function by\footnote{$\Bbb{R}$ is the real line, and $\Bbb{%
R}^{m}$ is the $m$-dimensional Euclidean space$.$ For $x=\left( x_{k}\right)
_{k=1}^{m}$ and $y=\left( x_{k}\right) _{k=1}^{m}$ in $\Bbb{R}^{m}$, we
write $x\geq y$ when $x_{k}\geq y_{k}$ for all $k,$ and $x>>y$ when $%
x_{k}>y_{k}$ for all $k$. The non-negative, non-positive, positive and
negative orthants of $\Bbb{R}^{m}$ are, respectively, $\Bbb{R}%
_{+}^{m}:=\left\{ x\in \Bbb{R}^{m}:x\geq 0\right\} $, $\Bbb{R}%
_{-}^{m}:=\left\{ x\in \Bbb{R}^{m}:x\leq 0\right\} $, $\Bbb{R}%
_{++}^{m}:=\left\{ x\in \Bbb{R}^{m}:x>>0\right\} $ and $\Bbb{R}%
_{--}^{m}:=\left\{ x\in \Bbb{R}^{m}:x<<0\right\} $.} $A:S\equiv S^{i}\times
S^{-i}\rightarrow \Bbb{R}^{m}$; thus $A(s^{i},s^{-i})\in \Bbb{R}^{m}$ is the
payoff vector when $i$ chooses $s^{i}$ and $-i$ chooses $s^{-i}.$ As usual, $%
A$ is extended bi-linearly to mixed actions, thus\footnote{%
We write $\Delta (Z)$ for the set of probability distributions on $Z,$ i.e.,
the $\left( \left| Z\right| -1\right) $-dimensional unit simplex $\Delta
(Z):=\{p\in \Bbb{R}_{+}^{Z}:\sum_{z\in Z}p(z)=1\}.$} $A:\Delta (S^{i})\times
\Delta (S^{-i})\rightarrow \Bbb{R}^{m}.$

Let time be discrete: $t=1,2,...$ , and denote by $%
s_{t}=(s_{t}^{i},s_{t}^{-i})\in S^{i}\times S^{-i}$ the actions chosen by $i$
and $-i$, respectively, at time $t$. The payoff vector in period $t$ is $%
a_{t}:=A(s_{t}),$ and $\overline{a}_{t}:=(1/t)\sum_{\tau \leq t}a_{\tau }$
is the average payoff vector up to $t.$ A \emph{strategy}\footnote{%
We use the term ``action'' for a one-period choice, and the term
``strategy'' for a multi-period choice.}\emph{\ for player }$i$ assigns to
every history of play $h_{t-1}=(s_{\tau })_{\tau \leq t-1}\in (S)^{t-1}$ a
(randomized) choice of action $\sigma _{t}^{i}\equiv \sigma
_{t}^{i}(h_{t-1})\in \Delta (S^{i})$ at time $t$, where $[\sigma
_{t}^{i}(h_{t-1})](s^{i})$ is, for each $s^{i}$ in $S^{i}$, the probability
that $i$ plays $s^{i}$ at period $t$ following a history $h_{t-1}$.

Let $\mathcal{C}$ $\subset \Bbb{R}^{m}$ be a convex and closed\footnote{%
A set is approachable if and only if its closure is approachable; we thus
assume without loss of generality that the set $\mathcal{C}$ is closed.}
set. The set $\mathcal{C}$ is \emph{approachable} by player $i$ (cf.
Blackwell [1956a]; see Remark \ref{rem-unif} below) if there is a strategy
of $i$ such that, no matter what $-i$ does,\footnote{%
We emphasize that the strategies of the opponents ($-i$) are not in any way
restricted; in particular, they may randomize and, furthermore, correlate
their actions. All the results in this paper hold against all possible
strategies of $-i$, and thus, \emph{a fortiori}, for any specific class of
strategies (like independent mixed strategies, and so on). Moreover,
requiring independence over $j\neq i$ will not increase the set of
strategies of $i$ that guarantee approachability, since the worse $-i$ can
do may always be taken to be pure (and thus independent).} $\mathrm{dist}(%
\overline{a}_{t},\mathcal{C})\rightarrow 0$ almost surely as $t\rightarrow
\infty $. Blackwell's result can then be stated as follows:\bigskip

\noindent \textbf{Blackwell's Approachability Theorem.}

\emph{(1) A convex and closed set }$\mathcal{C}$\emph{\ is approachable if
and only if every half-space }$\mathcal{H}$\emph{\ containing }$\mathcal{C}$%
\emph{\ is approachable.}

\emph{(2) A half-space }$\mathcal{H}$\emph{\ is approachable if and only if
there exists a mixed action of player }$i$\emph{\ such that the expected
vector payoff is guaranteed to lie in }$\mathcal{H}\emph{;}$\emph{\ i.e.,
there is }$\sigma ^{i}\in \Delta (S^{i})$\emph{\ such that }$A\left( \sigma
^{i},s^{-i}\right) \in \mathcal{H}$\emph{\ for all }$s^{-i}\in
S^{-i}.\bigskip $

The condition for $\mathcal{C}$ to be approachable may be restated as
follows (since, clearly, it suffices to consider in (1) only ``minimal''
half-spaces containing $\mathcal{C)}$: For every $\lambda \in \Bbb{R}^{m}$
there exists $\sigma ^{i}\in \Delta (S^{i})$ such that 
\begin{equation}
\lambda \cdot A\left( \sigma ^{i},s^{-i}\right) \leq w(\lambda ):=\sup
\{\lambda \cdot y:y\in \mathcal{C}\}\text{ for all }s^{-i}\in S^{-i}
\label{eq-w(lambda)}
\end{equation}
($w$ is the ``support function'' of $\mathcal{C}$; note that only those $%
\lambda \neq 0$ with $w(\lambda )<\infty $ matter for (\ref{eq-w(lambda)})).
Furthermore, the strategy constructed by Blackwell that yields
approachability uses at each step $t$ where the current average payoff $%
\overline{a}_{t-1}$ is not in $\mathcal{C},$ a mixed choice $\sigma _{t}^{i}$
satisfying (\ref{eq-w(lambda)}) for that vector $\lambda \equiv \lambda (%
\overline{a}_{t-1})$ which goes to $\overline{a}_{t-1}$ from that point $y$
in $\mathcal{C}$ that is closest to $\overline{a}_{t-1}$ (see Figure \ref
{fig-approach}%
%TCIMACRO{\TeXButton{BeginFigure}{\begin{figure}[htb] \centering}}
%BeginExpansion
\begin{figure}[htb] \centering%
%EndExpansion
%TCIMACRO{
%\TeXButton{Figure-approach}{\unitlength 1.00mm
%
%\begin{picture}(140.00,116.00)(10.00,17.00)
%
%\linethickness{1.0pt}
%\put(100.00,120.00){\circle*{1.50}}
%\put(80.00,80.00){\vector(1,2){19.5}}
%\put(140.00,50.00){\line(-2,1){120.00}}
%\bezier{736}(10.00,80.00)(90.00,110.00)(130.00,20.00)
%\put(102.00,122.50){\makebox(0,0)[lc]{$x=\overline{a}_{t-1}$}}
%\put(87.50,102.00){\makebox(0,0)[cc]{$\lambda$}}
%\put(78.00,70.00){\makebox(0,0)[cc]{{$\mathcal C$}}}
%\put(30.00,98.00){\makebox(0,0)[cc]{{${\mathcal H}$}}}
%\put(80.00,80.00){\circle*{1.50}}
%\put(79.2,83.5){\makebox(0,0)[cc]{$y$}}
%\put(15.00,25.00){\line(1,4){15.25}}
%\put(30.00,25.00){\line(1,4){15.55}}
%\put(45.00,25.00){\line(1,4){15.30}}
%\put(60.00,25.00){\line(1,4){14.45}}
%\put(75.00,25.00){\line(1,4){12.75}}
%\put(90.00,25.00){\line(1,4){10.15}}
%\put(105.00,25.00){\line(1,4){6.85}}
%\put(120.00,25.00){\line(1,4){2.5}}
%\put(100.00,120.00){\line(1,-4){18.00}}
%\put(118.00,48.00){\circle*{1.50}}
%\put(120.00,47.00){\makebox(0,0)[lc]{$b=E[a_{t}|h_{t-1}]$}}
%\put(103.50,106.00){\circle*{1.50}}
%\put(105.00,106.70){\makebox(0,0)[lc]{$E[\overline{a}_{t}|h_{t-1}]$}}
%
%\linethickness{0.4pt}
%
%\put(80.00,124.72){\line(1,0){2.02}}
%\multiput(82.02,124.67)(1.01,-0.07){2}{\line(1,0){1.01}}
%\multiput(86.04,124.31)(0.66,-0.11){3}{\line(1,0){0.66}}
%\multiput(90.02,123.58)(0.39,-0.10){5}{\line(1,0){0.39}}
%\multiput(93.91,122.50)(0.32,-0.11){6}{\line(1,0){0.32}}
%\multiput(97.69,121.07)(0.23,-0.11){8}{\line(1,0){0.23}}
%\multiput(101.32,119.31)(0.19,-0.11){9}{\line(1,0){0.19}}
%\multiput(104.78,117.23)(0.17,-0.12){10}{\line(1,0){0.17}}
%\multiput(108.04,114.84)(0.14,-0.12){11}{\line(1,0){0.14}}
%\multiput(111.07,112.17)(0.12,-0.12){12}{\line(0,-1){0.12}}
%\multiput(118.55,102.67)(0.11,-0.20){9}{\line(0,-1){0.20}}
%\multiput(120.43,99.10)(0.12,-0.26){7}{\line(0,-1){0.26}}
%\multiput(121.99,95.38)(0.11,-0.32){6}{\line(0,-1){0.32}}
%\multiput(123.21,91.53)(0.12,-0.49){4}{\line(0,-1){0.49}}
%\multiput(124.07,87.58)(0.10,-0.67){3}{\line(0,-1){0.67}}
%\put(124.58,83.57){\line(0,-1){2.02}}
%\put(124.72,79.54){\line(0,-1){2.02}}
%\multiput(124.49,75.50)(-0.08,-0.67){3}{\line(0,-1){0.67}}
%\multiput(123.91,71.51)(-0.11,-0.49){4}{\line(0,-1){0.49}}
%\multiput(122.96,67.58)(-0.10,-0.32){6}{\line(0,-1){0.32}}
%\multiput(121.66,63.75)(-0.11,-0.27){7}{\line(0,-1){0.27}}
%\multiput(120.03,60.06)(-0.12,-0.22){8}{\line(0,-1){0.22}}
%\multiput(118.07,56.53)(-0.11,-0.17){10}{\line(0,-1){0.17}}
%\multiput(115.79,53.19)(-0.11,-0.14){11}{\line(0,-1){0.14}}
%\multiput(113.23,50.07)(-0.12,-0.12){12}{\line(0,-1){0.12}}
%\multiput(110.39,47.19)(-0.13,-0.11){12}{\line(-1,0){0.13}}
%\multiput(107.31,44.59)(-0.16,-0.12){10}{\line(-1,0){0.16}}
%\multiput(104.00,42.27)(-0.19,-0.12){9}{\line(-1,0){0.19}}
%\multiput(100.50,40.25)(-0.23,-0.11){8}{\line(-1,0){0.23}}
%\multiput(96.83,38.57)(-0.31,-0.12){6}{\line(-1,0){0.31}}
%\multiput(93.02,37.22)(-0.39,-0.11){5}{\line(-1,0){0.39}}
%\multiput(89.11,36.22)(-0.50,-0.09){4}{\line(-1,0){0.50}}
%\multiput(85.12,35.57)(-1.01,-0.09){2}{\line(-1,0){1.01}}
%\put(81.09,35.29){\line(-1,0){2.02}}
%\multiput(77.05,35.38)(-1.01,0.09){2}{\line(-1,0){1.01}}
%\multiput(73.04,35.83)(-0.66,0.12){3}{\line(-1,0){0.66}}
%\multiput(69.08,36.63)(-0.39,0.11){5}{\line(-1,0){0.39}}
%\multiput(65.21,37.80)(-0.32,0.12){6}{\line(-1,0){0.32}}
%\multiput(61.46,39.30)(-0.23,0.11){8}{\line(-1,0){0.23}}
%\multiput(57.87,41.14)(-0.19,0.12){9}{\line(-1,0){0.19}}
%\multiput(54.45,43.30)(-0.16,0.12){10}{\line(-1,0){0.16}}
%\multiput(51.24,45.75)(-0.13,0.11){12}{\line(-1,0){0.13}}
%\multiput(48.27,48.49)(-0.12,0.12){12}{\line(0,1){0.12}}
%\multiput(45.56,51.48)(-0.11,0.14){11}{\line(0,1){0.14}}
%\multiput(43.12,54.70)(-0.11,0.17){10}{\line(0,1){0.17}}
%\multiput(40.99,58.13)(-0.12,0.22){8}{\line(0,1){0.22}}
%\multiput(39.18,61.74)(-0.11,0.27){7}{\line(0,1){0.27}}
%\multiput(37.70,65.50)(-0.10,0.32){6}{\line(0,1){0.32}}
%\multiput(36.56,69.37)(-0.11,0.49){4}{\line(0,1){0.49}}
%\multiput(35.78,73.34)(-0.09,0.67){3}{\line(0,1){0.67}}
%\put(35.36,77.35){\line(0,1){2.02}}
%\put(35.30,81.39){\line(0,1){2.02}}
%\multiput(35.61,85.42)(0.10,0.67){3}{\line(0,1){0.67}}
%\multiput(36.28,89.40)(0.12,0.49){4}{\line(0,1){0.49}}
%\multiput(37.31,93.31)(0.11,0.32){6}{\line(0,1){0.32}}
%\multiput(38.68,97.11)(0.12,0.26){7}{\line(0,1){0.26}}
%\multiput(40.39,100.77)(0.11,0.20){9}{\line(0,1){0.20}}
%\multiput(42.43,104.26)(0.11,0.17){10}{\line(0,1){0.17}}
%\multiput(44.77,107.55)(0.12,0.14){11}{\line(0,1){0.14}}
%\multiput(47.40,110.61)(0.12,0.12){12}{\line(0,1){0.12}}
%\multiput(50.30,113.43)(0.14,0.12){11}{\line(1,0){0.14}}
%\multiput(53.43,115.97)(0.17,0.12){10}{\line(1,0){0.17}}
%\multiput(56.79,118.22)(0.19,0.11){9}{\line(1,0){0.19}}
%\multiput(60.33,120.16)(0.23,0.11){8}{\line(1,0){0.23}}
%\multiput(64.04,121.77)(0.32,0.11){6}{\line(1,0){0.32}}
%\multiput(67.87,123.04)(0.39,0.10){5}{\line(1,0){0.39}}
%\multiput(71.80,123.96)(0.66,0.11){3}{\line(1,0){0.66}}
%\multiput(75.80,124.52)(1.01,0.07){2}{\line(1,0){1.01}}
%
%\end{picture}
%}}
%BeginExpansion
\unitlength 1.00mm

\begin{picture}(140.00,116.00)(10.00,17.00)

\linethickness{1.0pt}
\put(100.00,120.00){\circle*{1.50}}
\put(80.00,80.00){\vector(1,2){19.5}}
\put(140.00,50.00){\line(-2,1){120.00}}
\bezier{736}(10.00,80.00)(90.00,110.00)(130.00,20.00)
\put(102.00,122.50){\makebox(0,0)[lc]{$x=\overline{a}_{t-1}$}}
\put(87.50,102.00){\makebox(0,0)[cc]{$\lambda$}}
\put(78.00,70.00){\makebox(0,0)[cc]{{$\mathcal C$}}}
\put(30.00,98.00){\makebox(0,0)[cc]{{${\mathcal H}$}}}
\put(80.00,80.00){\circle*{1.50}}
\put(79.2,83.5){\makebox(0,0)[cc]{$y$}}
\put(15.00,25.00){\line(1,4){15.25}}
\put(30.00,25.00){\line(1,4){15.55}}
\put(45.00,25.00){\line(1,4){15.30}}
\put(60.00,25.00){\line(1,4){14.45}}
\put(75.00,25.00){\line(1,4){12.75}}
\put(90.00,25.00){\line(1,4){10.15}}
\put(105.00,25.00){\line(1,4){6.85}}
\put(120.00,25.00){\line(1,4){2.5}}
\put(100.00,120.00){\line(1,-4){18.00}}
\put(118.00,48.00){\circle*{1.50}}
\put(120.00,47.00){\makebox(0,0)[lc]{$b=E[a_{t}|h_{t-1}]$}}
\put(103.50,106.00){\circle*{1.50}}
\put(105.00,106.70){\makebox(0,0)[lc]{$E[\overline{a}_{t}|h_{t-1}]$}}

\linethickness{0.4pt}

\put(80.00,124.72){\line(1,0){2.02}}
\multiput(82.02,124.67)(1.01,-0.07){2}{\line(1,0){1.01}}
\multiput(86.04,124.31)(0.66,-0.11){3}{\line(1,0){0.66}}
\multiput(90.02,123.58)(0.39,-0.10){5}{\line(1,0){0.39}}
\multiput(93.91,122.50)(0.32,-0.11){6}{\line(1,0){0.32}}
\multiput(97.69,121.07)(0.23,-0.11){8}{\line(1,0){0.23}}
\multiput(101.32,119.31)(0.19,-0.11){9}{\line(1,0){0.19}}
\multiput(104.78,117.23)(0.17,-0.12){10}{\line(1,0){0.17}}
\multiput(108.04,114.84)(0.14,-0.12){11}{\line(1,0){0.14}}
\multiput(111.07,112.17)(0.12,-0.12){12}{\line(0,-1){0.12}}
\multiput(118.55,102.67)(0.11,-0.20){9}{\line(0,-1){0.20}}
\multiput(120.43,99.10)(0.12,-0.26){7}{\line(0,-1){0.26}}
\multiput(121.99,95.38)(0.11,-0.32){6}{\line(0,-1){0.32}}
\multiput(123.21,91.53)(0.12,-0.49){4}{\line(0,-1){0.49}}
\multiput(124.07,87.58)(0.10,-0.67){3}{\line(0,-1){0.67}}
\put(124.58,83.57){\line(0,-1){2.02}}
\put(124.72,79.54){\line(0,-1){2.02}}
\multiput(124.49,75.50)(-0.08,-0.67){3}{\line(0,-1){0.67}}
\multiput(123.91,71.51)(-0.11,-0.49){4}{\line(0,-1){0.49}}
\multiput(122.96,67.58)(-0.10,-0.32){6}{\line(0,-1){0.32}}
\multiput(121.66,63.75)(-0.11,-0.27){7}{\line(0,-1){0.27}}
\multiput(120.03,60.06)(-0.12,-0.22){8}{\line(0,-1){0.22}}
\multiput(118.07,56.53)(-0.11,-0.17){10}{\line(0,-1){0.17}}
\multiput(115.79,53.19)(-0.11,-0.14){11}{\line(0,-1){0.14}}
\multiput(113.23,50.07)(-0.12,-0.12){12}{\line(0,-1){0.12}}
\multiput(110.39,47.19)(-0.13,-0.11){12}{\line(-1,0){0.13}}
\multiput(107.31,44.59)(-0.16,-0.12){10}{\line(-1,0){0.16}}
\multiput(104.00,42.27)(-0.19,-0.12){9}{\line(-1,0){0.19}}
\multiput(100.50,40.25)(-0.23,-0.11){8}{\line(-1,0){0.23}}
\multiput(96.83,38.57)(-0.31,-0.12){6}{\line(-1,0){0.31}}
\multiput(93.02,37.22)(-0.39,-0.11){5}{\line(-1,0){0.39}}
\multiput(89.11,36.22)(-0.50,-0.09){4}{\line(-1,0){0.50}}
\multiput(85.12,35.57)(-1.01,-0.09){2}{\line(-1,0){1.01}}
\put(81.09,35.29){\line(-1,0){2.02}}
\multiput(77.05,35.38)(-1.01,0.09){2}{\line(-1,0){1.01}}
\multiput(73.04,35.83)(-0.66,0.12){3}{\line(-1,0){0.66}}
\multiput(69.08,36.63)(-0.39,0.11){5}{\line(-1,0){0.39}}
\multiput(65.21,37.80)(-0.32,0.12){6}{\line(-1,0){0.32}}
\multiput(61.46,39.30)(-0.23,0.11){8}{\line(-1,0){0.23}}
\multiput(57.87,41.14)(-0.19,0.12){9}{\line(-1,0){0.19}}
\multiput(54.45,43.30)(-0.16,0.12){10}{\line(-1,0){0.16}}
\multiput(51.24,45.75)(-0.13,0.11){12}{\line(-1,0){0.13}}
\multiput(48.27,48.49)(-0.12,0.12){12}{\line(0,1){0.12}}
\multiput(45.56,51.48)(-0.11,0.14){11}{\line(0,1){0.14}}
\multiput(43.12,54.70)(-0.11,0.17){10}{\line(0,1){0.17}}
\multiput(40.99,58.13)(-0.12,0.22){8}{\line(0,1){0.22}}
\multiput(39.18,61.74)(-0.11,0.27){7}{\line(0,1){0.27}}
\multiput(37.70,65.50)(-0.10,0.32){6}{\line(0,1){0.32}}
\multiput(36.56,69.37)(-0.11,0.49){4}{\line(0,1){0.49}}
\multiput(35.78,73.34)(-0.09,0.67){3}{\line(0,1){0.67}}
\put(35.36,77.35){\line(0,1){2.02}}
\put(35.30,81.39){\line(0,1){2.02}}
\multiput(35.61,85.42)(0.10,0.67){3}{\line(0,1){0.67}}
\multiput(36.28,89.40)(0.12,0.49){4}{\line(0,1){0.49}}
\multiput(37.31,93.31)(0.11,0.32){6}{\line(0,1){0.32}}
\multiput(38.68,97.11)(0.12,0.26){7}{\line(0,1){0.26}}
\multiput(40.39,100.77)(0.11,0.20){9}{\line(0,1){0.20}}
\multiput(42.43,104.26)(0.11,0.17){10}{\line(0,1){0.17}}
\multiput(44.77,107.55)(0.12,0.14){11}{\line(0,1){0.14}}
\multiput(47.40,110.61)(0.12,0.12){12}{\line(0,1){0.12}}
\multiput(50.30,113.43)(0.14,0.12){11}{\line(1,0){0.14}}
\multiput(53.43,115.97)(0.17,0.12){10}{\line(1,0){0.17}}
\multiput(56.79,118.22)(0.19,0.11){9}{\line(1,0){0.19}}
\multiput(60.33,120.16)(0.23,0.11){8}{\line(1,0){0.23}}
\multiput(64.04,121.77)(0.32,0.11){6}{\line(1,0){0.32}}
\multiput(67.87,123.04)(0.39,0.10){5}{\line(1,0){0.39}}
\multiput(71.80,123.96)(0.66,0.11){3}{\line(1,0){0.66}}
\multiput(75.80,124.52)(1.01,0.07){2}{\line(1,0){1.01}}

\end{picture}
%
%EndExpansion
\caption{Approaching the set $\mathcal{C}$ by Blackwell's
strategy\label{fig-approach}}%
%TCIMACRO{\TeXButton{EndFigure}{\end{figure}}}
%BeginExpansion
\end{figure}%
%EndExpansion
). To get some intuition, note that the next period expected payoff vector $%
b:=E[a_{t}|h_{t-1}]$ lies in the half-space $\mathcal{H}$, and thus
satisfies $\lambda \cdot b\leq w(\lambda )<\lambda \cdot \overline{a}_{t-1},$
which implies that 
\[
\lambda \cdot (E[\overline{a}_{t}|h_{t-1}]-\overline{a}_{t-1})=\lambda \cdot
(\frac{1}{t}b+\frac{t-1}{t}\overline{a}_{t-1}-\overline{a}_{t-1})=\frac{1}{t}%
\;\lambda \cdot (b-\overline{a}_{t-1})<0. 
\]
Therefore the expected average payoff $E[\overline{a}_{t}|h_{t-1}]$ moves
from $\overline{a}_{t-1}$ in the ``general direction'' of $\mathcal{C}$; in
fact, it is closer than $\overline{a}_{t-1}$ to $\mathcal{C}$. Hence $E[%
\overline{a}_{t}|h_{t-1}]$ converges to $\mathcal{C}$, and so does the
average payoff $\overline{a}_{t}$ (by the Law of Large Numbers).

Fix an approachable convex and closed set $\mathcal{C}.$ We will now
consider general strategies of player $i$ which---like Blackwell's strategy
above---are defined in terms of a \emph{directional mapping,} that is, a
function $\Lambda :\Bbb{R}^{m}\backslash \mathcal{C}\rightarrow \Bbb{R}^{m}$
that associates to every $x\notin \mathcal{C}$ a corresponding ``direction'' 
$\Lambda (x).$ Given such a mapping $\Lambda ,$ a strategy of player $i$ is
called a $\Lambda $-\emph{strategy} if, whenever $\overline{a}_{t-1}$ does
not lie in $\mathcal{C},$ it prescribes using at time $t$ a mixed action $%
\sigma _{t}^{i}$ that satisfies 
\begin{equation}
\Lambda (\overline{a}_{t-1})\cdot A\left( \sigma _{t}^{i},s^{-i}\right) \leq
w(\Lambda (\overline{a}_{t-1}))\text{ for all }s^{-i}\in S^{-i}
\label{eq-directional-strategy}
\end{equation}
(see Figure \ref{fig-lambda}%
%TCIMACRO{\TeXButton{BeginFigure}{\begin{figure}[htb] \centering}}
%BeginExpansion
\begin{figure}[htb] \centering%
%EndExpansion
%TCIMACRO{
%\TeXButton{Figure-lambda}{ \unitlength 1.00mm
%
%\begin{picture}(150.00,101.00)(10.00,17.00)
%
%\linethickness{1.0pt}
% 
%\put(80.00,80.00){\vector(1,2){5}}
%\put(140.00,50.00){\line(-2,1){120.00}}
%\bezier{736}(10.00,80.00)(90.00,110.00)(130.00,20.00)
%
%\linethickness{0.7pt}
%  
%\put(112.00,101.00){\circle*{1.50}}
%\put(112.00,101.00){\line(1,-6){9.00}}
%\put(121,47){\circle*{1.5}}
%\put(114,89){\circle*{1.5}}
%
%\put(113.00,101){\makebox(0,0)[lc]{$x=\overline{a}_{t-1}$}}
%\put(123.00,47.00){\makebox(0,0)[lc]{$b=E[a_{t}|h_{t-1}]$}}
%\put(116.00,89){\makebox(0,0)[lc]{$E[\overline{a}_{t}|h_{t-1}]$}}
%
%\put(78,88){\makebox(0,0)[cc]{$\Lambda (x)$}}
%
%\put(78.00,70.00){\makebox(0,0)[cc]{{$\mathcal C$}}}
%\put(30.00,98.00){\makebox(0,0)[cc]{{${\mathcal H}$}}}
%
%\put(80.00,80.00){\circle*{1.50}}
%
%\put(15.00,25.00){\line(1,4){15.25}}
%\put(30.00,25.00){\line(1,4){15.55}}
%\put(45.00,25.00){\line(1,4){15.30}}
%\put(60.00,25.00){\line(1,4){14.45}}
%\put(75.00,25.00){\line(1,4){12.75}}
%\put(90.00,25.00){\line(1,4){10.15}}
%\put(105.00,25.00){\line(1,4){6.85}}
%\put(120.00,25.00){\line(1,4){2.5}}
%
%\end{picture}
%}}
%BeginExpansion
 \unitlength 1.00mm

\begin{picture}(150.00,101.00)(10.00,17.00)

\linethickness{1.0pt}
 
\put(80.00,80.00){\vector(1,2){5}}
\put(140.00,50.00){\line(-2,1){120.00}}
\bezier{736}(10.00,80.00)(90.00,110.00)(130.00,20.00)

\linethickness{0.7pt}
  
\put(112.00,101.00){\circle*{1.50}}
\put(112.00,101.00){\line(1,-6){9.00}}
\put(121,47){\circle*{1.5}}
\put(114,89){\circle*{1.5}}

\put(113.00,101){\makebox(0,0)[lc]{$x=\overline{a}_{t-1}$}}
\put(123.00,47.00){\makebox(0,0)[lc]{$b=E[a_{t}|h_{t-1}]$}}
\put(116.00,89){\makebox(0,0)[lc]{$E[\overline{a}_{t}|h_{t-1}]$}}

\put(78,88){\makebox(0,0)[cc]{$\Lambda (x)$}}

\put(78.00,70.00){\makebox(0,0)[cc]{{$\mathcal C$}}}
\put(30.00,98.00){\makebox(0,0)[cc]{{${\mathcal H}$}}}

\put(80.00,80.00){\circle*{1.50}}

\put(15.00,25.00){\line(1,4){15.25}}
\put(30.00,25.00){\line(1,4){15.55}}
\put(45.00,25.00){\line(1,4){15.30}}
\put(60.00,25.00){\line(1,4){14.45}}
\put(75.00,25.00){\line(1,4){12.75}}
\put(90.00,25.00){\line(1,4){10.15}}
\put(105.00,25.00){\line(1,4){6.85}}
\put(120.00,25.00){\line(1,4){2.5}}

\end{picture}
%
%EndExpansion
\caption{A $\Lambda$-strategy\label{fig-lambda}}%
%TCIMACRO{\TeXButton{EndFigure}{\end{figure}}}
%BeginExpansion
\end{figure}%
%EndExpansion
: a $\Lambda $-strategy guarantees that, when $x=\overline{a}_{t-1}\notin 
\mathcal{C}$, the next period expected payoff vector $b=E[a_{t}|h_{t-1}]$
lies in the smallest half-space $\mathcal{H}$ with normal $\Lambda (x)$ that
contains $\mathcal{C}$); notice that there is no requirement when $\overline{%
a}_{t-1}\in \mathcal{C}.$ We are interested in finding conditions on the
mapping $\Lambda $ such that, if player $i$ uses a $\Lambda $-strategy, then
the set $\mathcal{C}$ is guaranteed to be approached, no matter what $-i$
does.

We introduce three conditions on a directional mapping $\Lambda ,$ relative
to the given set $\mathcal{C}$.

\begin{description}
\item[(D1)]  $\Lambda $ is continuous.

\item[(D2)]  $\Lambda $ is integrable, namely there exists a Lipschitz
function\footnote{%
Notice that $P$ is defined on the whole space $\Bbb{R}^{m}$.} $P:\Bbb{R}%
^{m}\rightarrow \Bbb{R}$ such that $\nabla P(x)=\phi (x)\Lambda (x)$ for
almost every $x\notin \mathcal{C},$ where $\phi :\Bbb{R}^{m}\backslash 
\mathcal{C}\rightarrow \Bbb{R}_{++}$ is a continuous positive function.

\item[(D3)]  $\Lambda (x)\cdot x>w(\Lambda (x))$ for all $x\notin \mathcal{C}%
.$
\end{description}

See Figure \ref{fig-potl}%
%TCIMACRO{\TeXButton{BeginFigure}{\begin{figure}[htb] \centering}}
%BeginExpansion
\begin{figure}[htb] \centering%
%EndExpansion
%TCIMACRO{
%\TeXButton{Figure-potl}{\unitlength 1.00mm
%
%\begin{picture}(150.00,108.00)(0.00,17.00)
%
%\linethickness{1.0pt}
% 
%\put(80.00,80.00){\vector(1,2){5}}
%\put(140.00,50.00){\line(-2,1){120.00}}
%\bezier{736}(10.00,80.00)(90.00,110.00)(130.00,20.00)
%
%\linethickness{0.7pt}
%
%\bezier{552}(141.00,58.00)(129.00,98.33)(37.00,115.00)
%\bezier{548}(31.00,108.00)(102.67,97.00)(136.00,42.00)
%\put(104.00,95.0){\circle*{1.50}}
%\put(104,95){\vector(1,2){7}}
%\put(124,85){\line(-2,1){40}}
%
%\put(108.00,96){\makebox(0,0)[lc]{$x=\overline{a}_{t-1}$}}
%\put(113,111){\makebox(0,0)[cc]{$\nabla P(x)$}}
%\put(35,110){\makebox(0,0)[cc]{$P=c_1$}}
%\put(41,117){\makebox(0,0)[cc]{$P=c_2$}}
%\put(78,88){\makebox(0,0)[cc]{$\Lambda (x)$}}
%\put(78.00,70.00){\makebox(0,0)[cc]{{$\mathcal C$}}}
%\put(30.00,98.00){\makebox(0,0)[cc]{{${\mathcal H}$}}}
%\put(80.00,80.00){\circle*{1.50}}
%\put(15.00,25.00){\line(1,4){15.25}}
%\put(30.00,25.00){\line(1,4){15.55}}
%\put(45.00,25.00){\line(1,4){15.30}}
%\put(60.00,25.00){\line(1,4){14.45}}
%\put(75.00,25.00){\line(1,4){12.75}}
%\put(90.00,25.00){\line(1,4){10.15}}
%\put(105.00,25.00){\line(1,4){6.85}}
%\put(120.00,25.00){\line(1,4){2.5}}
%
%\end{picture}
%}}
%BeginExpansion
\unitlength 1.00mm

\begin{picture}(150.00,108.00)(0.00,17.00)

\linethickness{1.0pt}
 
\put(80.00,80.00){\vector(1,2){5}}
\put(140.00,50.00){\line(-2,1){120.00}}
\bezier{736}(10.00,80.00)(90.00,110.00)(130.00,20.00)

\linethickness{0.7pt}

\bezier{552}(141.00,58.00)(129.00,98.33)(37.00,115.00)
\bezier{548}(31.00,108.00)(102.67,97.00)(136.00,42.00)
\put(104.00,95.0){\circle*{1.50}}
\put(104,95){\vector(1,2){7}}
\put(124,85){\line(-2,1){40}}

\put(108.00,96){\makebox(0,0)[lc]{$x=\overline{a}_{t-1}$}}
\put(113,111){\makebox(0,0)[cc]{$\nabla P(x)$}}
\put(35,110){\makebox(0,0)[cc]{$P=c_1$}}
\put(41,117){\makebox(0,0)[cc]{$P=c_2$}}
\put(78,88){\makebox(0,0)[cc]{$\Lambda (x)$}}
\put(78.00,70.00){\makebox(0,0)[cc]{{$\mathcal C$}}}
\put(30.00,98.00){\makebox(0,0)[cc]{{${\mathcal H}$}}}
\put(80.00,80.00){\circle*{1.50}}
\put(15.00,25.00){\line(1,4){15.25}}
\put(30.00,25.00){\line(1,4){15.55}}
\put(45.00,25.00){\line(1,4){15.30}}
\put(60.00,25.00){\line(1,4){14.45}}
\put(75.00,25.00){\line(1,4){12.75}}
\put(90.00,25.00){\line(1,4){10.15}}
\put(105.00,25.00){\line(1,4){6.85}}
\put(120.00,25.00){\line(1,4){2.5}}

\end{picture}
%
%EndExpansion
\caption{The directional mapping $\Lambda$ and level sets of the potential
$P$\label{fig-potl}}%
%TCIMACRO{\TeXButton{EndFigure}{\end{figure}}}
%BeginExpansion
\end{figure}%
%EndExpansion
. The geometric meaning of (D3) is that the point $x$ is strictly separated
from the set $\mathcal{C}$ by $\Lambda (x).$ Note that (D3) implies that all 
$\lambda $ with $w(\lambda )=\infty ,$ as well as $\lambda =0,$ are not
allowable directions. Also, observe that the combination of (D1) and (D2)
implies that $P$ is continuously differentiable on $\Bbb{R}^{m}\backslash 
\mathcal{C}$ (see Clarke [1983, Corollary to Proposition 2.2.4 and Theorem
2.5.1]). We will refer to the function $P$ as the \emph{potential} of $%
\Lambda $.

The main result of this section is:

\begin{theorem}
\label{th-general}Suppose that player $i$ uses a $\Lambda $-strategy, where $%
\Lambda $ is a directional mapping satisfying (D1), (D2), and (D3) for the
approachable convex and closed set $\mathcal{C}.$ Then the average payoff
vector is guaranteed to approach the set $\mathcal{C};$ that is, $\mathrm{%
dist}(\overline{a}_{t},\mathcal{C})\rightarrow 0$ almost surely as $%
t\rightarrow \infty $, for any strategy of $-i.$
\end{theorem}

Before proving the Theorem (in the next subsection), we state a number of
comments.\bigskip \pagebreak

\textbf{Remarks.}

\begin{enumerate}
\item  \label{rem-universal}The conditions (D1)--(D3) are independent of the
game $A$ (they depend on $\mathcal{C}$ only). That is, given a directional
mapping $\Lambda $ satisfying (D1)--(D3), a $\Lambda $-strategy is
guaranteed to approach $\mathcal{C}$ for $\emph{any}$ game $A$ for which $%
\mathcal{C}$ is approachable (of course, the specific choice of action
depends on $A,$ according to (\ref{eq-directional-strategy})). It is in this
sense that we refer to the $\Lambda $-strategies as ``universal.''

\item  \label{rem-infinite-S}The action sets $S^{i}$ and $S^{-i}$ need not
be finite; as we will see in the proof, it suffices for the range of $A$ to
be bounded.

\item  \label{rem-unif}As in Blackwell's result, our proof below yields 
\emph{uniform} approachability: For every $\varepsilon $ there is $%
t_{0}\equiv t_{0}(\varepsilon )$ such that $E\left[ \mathrm{dist}(\overline{a%
}_{t},\mathcal{C})\right] <\varepsilon $ for all $t>t_{0}$ and all
strategies of $-i$ (i.e., $t_{0}$ is independent of the strategy of $-i).$

\item  The conditions on $P$ are invariant under strictly increasing
monotone transformations (with positive derivative); that is, only the level
sets of $P$ matter.

\item  \label{rem-convex-P}If the potential $P$ is a convex function and $%
\mathcal{C}=\{y:P(y)\leq c\}$ for some constant $c$, then (D3) is
automatically satisfied: $P(x)>P(y)$ implies $\nabla P(x)\cdot x>\nabla
P(x)\cdot y$.

\item  \label{rem-norm}Given a norm $\left\| \cdot \right\| $ on $\Bbb{R}%
^{m},$ consider the resulting ``distance from $\mathcal{C}$'' function $%
P(x):=\min_{y\in \mathcal{C}}\left\| x-y\right\| .$ If $P$ is a smooth
function (which is always the case when either the norm is smooth---i.e.,
the corresponding unit ball has smooth boundary---or when the boundary of $%
\mathcal{C}$ is smooth), then the mapping $\Lambda =\nabla P$ satisfies
(D1)--(D3) (the latter by the previous Remark \ref{rem-convex-P}). In
particular, the $l_{2}$ Euclidean norm yields precisely the Blackwell
strategy, since then $\nabla P(x)$ is proportional to $x-y(x),$ where $%
y(x)\in \mathcal{C}$ is the point in $\mathcal{C}$ closest to $x$. The $%
l_{p} $ norm is smooth for $1<p<\infty ;$ therefore it yields strategies
that guarantee approachability for \emph{any} approachable set $\mathcal{C}$%
. However, if the boundary of $\mathcal{C}$ is not smooth---for instance,
when $\mathcal{C}$ is an orthant, an important case in applications---then
(D1) is \emph{not} satisfied in the extreme cases $p=1$ and $p=\infty $ (see
Figure \ref{fig-lp}%
%TCIMACRO{\TeXButton{BeginFigure}{\begin{figure}[htbp] \centering}}
%BeginExpansion
\begin{figure}[htbp] \centering%
%EndExpansion
%TCIMACRO{
%\TeXButton{Figure-lp}{ \unitlength 0.90mm
%
%\begin{picture}(150.00,66.00)(5,2)
%\linethickness{1.0pt}
% \put(25.00,60.00){\vector(0,1){0.2}}
%\put(25.00,20.00){\line(0,1){40.00}}
%  \put(50.00,35.00){\vector(1,0){0.2}}
%\put(10.00,35.00){\line(1,0){40.00}}
% 
%\put(18.00,28.50){\makebox(0,0)[cc]{$\mathcal{C}$}}
%
%  \put(75.00,60.00){\vector(0,1){0.2}}
%\put(75.00,20.00){\line(0,1){40.00}}
%  \put(125.00,60.00){\vector(0,1){0.2}}
%\put(125.00,20.00){\line(0,1){40.00}}
%  \put(100.00,35.00){\vector(1,0){0.2}}
%\put(60.00,35.00){\line(1,0){40.00}}
%  \put(150.00,35.00){\vector(1,0){0.2}}
%\put(110.00,35.00){\line(1,0){40.00}}
% \put(68.00,28.50){\makebox(0,0)[cc]{$\mathcal{C}$}}
%\put(118.00,28.50){\makebox(0,0)[cc]{$\mathcal{C}$}}
%
%\put(25.00,44.00){\line(1,-1){9.00}}
%\put(25.00,53.00){\line(1,-1){18.00}}
%
%\linethickness{0.3pt}
% \put(10.00,44.00){\line(1,0){15.00}}
%  \put(34.00,35.00){\line(0,-1){15.00}}
%  \put(43.00,20.00){\line(0,1){15.00}}
%  \put(25.00,53.00){\line(-1,0){15.00}}
%
% \put(60.00,44.00){\line(1,0){15.00}}
%  \put(110.00,44.00){\line(1,0){15.00}}
%  \put(84.00,35.00){\line(0,-1){15.00}}
%  \put(134.00,35.00){\line(0,-1){15.00}}
%  \put(93.00,20.00){\line(0,1){15.00}}
%  \put(143.00,20.00){\line(0,1){15.00}}
%  \put(75.00,53.00){\line(-1,0){15.00}}
%  \put(125.00,53.00){\line(-1,0){15.00}}
% 
%\put(125.00,53.00){\line(1,0){18.00}}
%\put(143.00,53.00){\line(0,-1){18.00}}
%\put(125.00,44.00){\line(1,0){9.00}}
%\put(134.00,44.00){\line(0,-1){9.00}}
%\put(25.00,8.00){\makebox(0,0)[cc]{$p=1$}}
%\put(75.00,8.00){\makebox(0,0)[cc]{$1<p<\infty$}}
%\put(125.00,8.00){\makebox(0,0)[cc]{$p=\infty$}}
%    \bezier{152}(75.00,53.00)(91.50,51.50)(92.90,35.00)
%\bezier{76}(75.00,44.00)(83.25,43.25)(83.90,35.00)
%
%\linethickness{0.3pt}
%\put(25,35){\line(-2,-3){10}}
%\put(19,35){\line(-2,-3){9}}
%\put(13,35){\line(-2,-3){3}}
%\put(25,26){\line(-2,-3){4}}
%
%\put(75,35){\line(-2,-3){10}}
%\put(69,35){\line(-2,-3){9}}
%\put(63,35){\line(-2,-3){3}}
%\put(75,26){\line(-2,-3){4}}
%
%\put(125,35){\line(-2,-3){10}}
%\put(119,35){\line(-2,-3){9}}
%\put(113,35){\line(-2,-3){3}}
%\put(125,26){\line(-2,-3){4}}
%
% \end{picture}
%}}
%BeginExpansion
 \unitlength 0.90mm

\begin{picture}(150.00,66.00)(5,2)
\linethickness{1.0pt}
 \put(25.00,60.00){\vector(0,1){0.2}}
\put(25.00,20.00){\line(0,1){40.00}}
  \put(50.00,35.00){\vector(1,0){0.2}}
\put(10.00,35.00){\line(1,0){40.00}}
 
\put(18.00,28.50){\makebox(0,0)[cc]{$\mathcal{C}$}}

  \put(75.00,60.00){\vector(0,1){0.2}}
\put(75.00,20.00){\line(0,1){40.00}}
  \put(125.00,60.00){\vector(0,1){0.2}}
\put(125.00,20.00){\line(0,1){40.00}}
  \put(100.00,35.00){\vector(1,0){0.2}}
\put(60.00,35.00){\line(1,0){40.00}}
  \put(150.00,35.00){\vector(1,0){0.2}}
\put(110.00,35.00){\line(1,0){40.00}}
 \put(68.00,28.50){\makebox(0,0)[cc]{$\mathcal{C}$}}
\put(118.00,28.50){\makebox(0,0)[cc]{$\mathcal{C}$}}

\put(25.00,44.00){\line(1,-1){9.00}}
\put(25.00,53.00){\line(1,-1){18.00}}

\linethickness{0.3pt}
 \put(10.00,44.00){\line(1,0){15.00}}
  \put(34.00,35.00){\line(0,-1){15.00}}
  \put(43.00,20.00){\line(0,1){15.00}}
  \put(25.00,53.00){\line(-1,0){15.00}}

 \put(60.00,44.00){\line(1,0){15.00}}
  \put(110.00,44.00){\line(1,0){15.00}}
  \put(84.00,35.00){\line(0,-1){15.00}}
  \put(134.00,35.00){\line(0,-1){15.00}}
  \put(93.00,20.00){\line(0,1){15.00}}
  \put(143.00,20.00){\line(0,1){15.00}}
  \put(75.00,53.00){\line(-1,0){15.00}}
  \put(125.00,53.00){\line(-1,0){15.00}}
 
\put(125.00,53.00){\line(1,0){18.00}}
\put(143.00,53.00){\line(0,-1){18.00}}
\put(125.00,44.00){\line(1,0){9.00}}
\put(134.00,44.00){\line(0,-1){9.00}}
\put(25.00,8.00){\makebox(0,0)[cc]{$p=1$}}
\put(75.00,8.00){\makebox(0,0)[cc]{$1<p<\infty$}}
\put(125.00,8.00){\makebox(0,0)[cc]{$p=\infty$}}
    \bezier{152}(75.00,53.00)(91.50,51.50)(92.90,35.00)
\bezier{76}(75.00,44.00)(83.25,43.25)(83.90,35.00)

\linethickness{0.3pt}
\put(25,35){\line(-2,-3){10}}
\put(19,35){\line(-2,-3){9}}
\put(13,35){\line(-2,-3){3}}
\put(25,26){\line(-2,-3){4}}

\put(75,35){\line(-2,-3){10}}
\put(69,35){\line(-2,-3){9}}
\put(63,35){\line(-2,-3){3}}
\put(75,26){\line(-2,-3){4}}

\put(125,35){\line(-2,-3){10}}
\put(119,35){\line(-2,-3){9}}
\put(113,35){\line(-2,-3){3}}
\put(125,26){\line(-2,-3){4}}

 \end{picture}
%
%EndExpansion
\caption{The $l_p$ potential for an orthant $\mathcal{C}$\label{fig-lp}}%
%TCIMACRO{\TeXButton{EndFigure}{\end{figure}}}
%BeginExpansion
\end{figure}%
%EndExpansion
; more on these two cases below).

\item  When $\mathcal{C}=\Bbb{R}_{-}^{m}$ and $P$ is given by (D2),
condition (D3) becomes $\nabla P(x)\cdot x>0$ for every $x\notin \mathcal{C}%
, $ which means that $P$ is increasing along any ray from the origin that
goes outside the negative orthant.
\end{enumerate}

\subsection{Proof of Theorem \ref{th-general}\label{ss-appr-proof}}

We begin by proving two auxiliary results. The first applies to functions $Q$
that satisfy conditions similar to but stronger than (D1)--(D3); the second
allows us to reduce the general case to such a $Q.$ The set $\mathcal{C},$
the mappings $\Lambda $ and $P,$ and the strategy of $i$ (which is a $%
\Lambda $-strategy) are fixed throughout. Also, let $K$ be a convex and
compact set containing in its interior the range of $A$ (recall that $S$ is
finite).

\begin{lemma}
\label{l-special_case}Let $Q:\Bbb{R}^{m}\rightarrow \Bbb{R}$ be a
continuously differentiable function that satisfies:

\begin{description}
\item[(i)]  $Q(x)\geq 0$ for all $x;$

\item[(ii)]  $Q(x)=0$ for all $x\in \mathcal{C};$

\item[(iii)]  $\nabla Q(x)\cdot x-w(\nabla Q(x))\geq Q(x)$ for all $x\in
K\backslash \mathcal{C}$; and

\item[(iv)]  $\nabla Q(x)$ is non-negatively proportional to $\Lambda (x)$
(i.e., $\nabla Q(x)=\phi (x)\Lambda (x)$ where $\phi (x)\geq 0)$ for all $%
x\notin \mathcal{C}$.
\end{description}

Then $\lim_{t\rightarrow \infty }Q(\overline{a}_{t})=0$ a.s. for any
strategy of $-i.$
\end{lemma}

%TCIMACRO{\TeXButton{Proof}{\proof}}
%BeginExpansion
\proof%
%EndExpansion
We have $\overline{a}_{t}-\overline{a}_{t-1}=(1/t)(a_{t}-\overline{a}%
_{t-1}); $ thus, writing $x$ for $\overline{a}_{t-1},$ 
\begin{equation}
Q(\overline{a}_{t})=Q(x)+\nabla Q(x)\cdot \frac{1}{t}(a_{t}-x)+o\left( \frac{%
1}{t}\right) ,  \label{eq-0}
\end{equation}
since $Q$ is (continuously) differentiable. Moreover, the remainder $o(1/t)$
is uniform, since all relevant points lie in the compact set $K.$ If $%
x\notin \mathcal{C}$ then player $i$ plays at time $t$ so that 
\begin{equation}
\nabla Q(x)\cdot E[a_{t}|h_{t-1}]\leq w(\nabla Q(x))  \label{eq-1}
\end{equation}
(by (\ref{eq-directional-strategy}) and (iv)); if $x\in \mathcal{C}$ then $%
\nabla Q(x)=0$ (by (i) and (ii)), and (\ref{eq-1}) holds too. Taking
conditional expectation in (\ref{eq-0}) and then substituting (\ref{eq-1})
yields 
\begin{eqnarray*}
E[Q(\overline{a}_{t})|h_{t-1}] &\leq &Q(x)+\frac{1}{t}\left( w(\nabla
Q(x))-\nabla Q(x)\cdot x\right) +o\left( \frac{1}{t}\right) \\
&\leq &Q(x)-\frac{1}{t}Q(x)+o\left( \frac{1}{t}\right) ,
\end{eqnarray*}
where we have used (iii) when $x\notin \mathcal{C}$ and (i)--(ii) when $x\in 
\mathcal{C}$. Thus

\[
E[Q(\overline{a}_{t})|h_{t-1}]\leq \frac{t-1}{t}Q(\overline{a}%
_{t-1})+o\left( \frac{1}{t}\right) . 
\]

This may be rewritten as\footnote{%
Recall that the remainder term $o(1/t)$ was uniform; that is, for every $%
\varepsilon >0$ there is $t_{0}\left( \varepsilon \right) $ such that $%
o(1)<\varepsilon $ is guaranteed for all $t>t_{0}(\varepsilon ).$} 
\begin{equation}
E[\zeta _{t}|h_{t-1}]\leq o(1),  \label{eq-2}
\end{equation}
where $\zeta _{t}:=tQ(\overline{a}_{t})-(t-1)Q(\overline{a}_{t-1}).$ Hence $%
\lim \sup_{t\rightarrow \infty }(1/t)\sum_{\tau \leq t}E[\zeta _{\tau
}|h_{\tau -1}]\leq 0.$ The Strong Law of Large Numbers for Dependent Random
Variables (see Lo\`{e}ve [1978, Theorem 32.1.E]) implies that $%
(1/t)\sum_{\tau \leq t}\left( \zeta _{\tau }-E[\zeta _{\tau }|h_{\tau
-1}]\right) \rightarrow 0$ a.s. as $t\rightarrow \infty $ (note that the $%
\zeta _{t}$'s are uniformly bounded, as can be immediately seen from
equation (\ref{eq-0}): $\zeta _{t}=Q(\overline{a}_{t-1})+\nabla Q(\overline{a%
}_{t-1})\cdot (a_{t}-\overline{a}_{t-1})+o(1),$ and from the fact that
everything happens in the compact set $K$). Therefore $\lim
\sup_{t\rightarrow \infty }(1/t)\sum_{\tau \leq t}\zeta _{\tau }\leq 0.$ But 
$0\leq Q(\overline{a}_{t})=(1/t)\sum_{\tau \leq t}\zeta _{\tau },$ so $%
\lim_{t\rightarrow \infty }Q(\overline{a}_{t})=0$.%
%TCIMACRO{\TeXButton{End Proof}{\endproof}}
%BeginExpansion
\endproof%
%EndExpansion

\begin{lemma}
\label{boundary-C}The function $P$ satisfies:

\begin{description}
\item[(c1)]  If the boundary of $\mathcal{C}$ is connected, then there
exists a constant $c$ such that 
\[
\left\{ 
\begin{tabular}{ll}
$P(x)=c,$ & if $x\in \mathrm{bd}\ \mathcal{C};$ \\ 
$P(x)>c,$ & if $x\notin \mathcal{C}.$%
\end{tabular}
\right. 
\]

\item[(c2)]  If the boundary of $\mathcal{C}$ is not connected, then there
exists a $\lambda \in \Bbb{R}^{m}\backslash \{0\}$ such that\footnote{%
This is a general fact about convex sets: The only case where the boundary
of a convex closed set $\mathcal{C}\subset \Bbb{R}^{m}$ is not
path-connected is when $\mathcal{C}$ is the set of points lying between two
parallel hyperplanes. We prove this in Steps 1--3 below, independently of
the function $P.$} $\mathcal{C}=\{x\in \Bbb{R}^{m}:-w(-\lambda )\leq \lambda
\cdot x\leq w(\lambda )\}$ (where $w(\lambda )<\infty $ and $w(-\lambda
)<\infty ),$ and there are constants $c_{1}$ and $c_{2}$ such that 
\[
\left\{ 
\begin{tabular}{ll}
$P(x)=c_{1},$ & if $x\in \mathrm{bd}\ \mathcal{C}$ and $\lambda \cdot
x=w(\lambda );$ \\ 
$P(x)=c_{2},$ & if $x\in \mathrm{bd}\ \mathcal{C}$ and $(-\lambda )\cdot
x=w(-\lambda );$ \\ 
$P(x)>c_{1},$ & if $x\notin \mathcal{C}$ and $\lambda \cdot x>w(\lambda );$
\\ 
$P(x)>c_{2},$ & if $x\notin \mathcal{C}$ and $(-\lambda )\cdot x>w(-\lambda
).$%
\end{tabular}
\right. 
\]
\end{description}
\end{lemma}

%TCIMACRO{\TeXButton{Proof}{\proof}}
%BeginExpansion
\proof%
%EndExpansion
Let $x_{0},x_{1}\in \mathrm{bd}\ \mathcal{C},$ and denote by $\lambda _{j},$
for $j=0,1,$ an outward unit normal to $\mathcal{C}$ at $x_{j};$ thus $%
\left\| \lambda _{j}\right\| =1$ and $\lambda _{j}\cdot x_{j}=w(\lambda
_{j}).$

\emph{Step 1:} If $\lambda _{1}\neq -\lambda _{0},$ we claim that there is a
path on $\mathrm{bd}\ \mathcal{C}$ connecting $x_{0}$ and $x_{1},$ and
moreover $P(x_{0})=P(x_{1}).$ Indeed, there exists a vector\footnote{%
Take for instance $z=\lambda _{0}+\lambda _{1}.$} $z\in \Bbb{R}^{m}$ such
that $\lambda _{0}\cdot z>0$ and $\lambda _{1}\cdot z>0.$ The straight line
segment connecting $x_{0}$ and $x_{1}$ lies in $\mathcal{C};$ we move it in
the direction $z$ until it reaches the boundary of $\mathcal{C}.$ That is,
for each $\eta \in [0,1],$ let $y(\eta ):=$ $\eta x_{1}+(1-\eta
)x_{0}+\alpha (\eta )z$, where $\alpha (\eta ):=\max \{\beta :\eta
x_{1}+(1-\eta )x_{0}+\beta z\in \mathcal{C}\};$ this maximum exists by the
choice of $z$. Note that $y(\cdot )$ is a path on $\mathrm{bd}\ \mathcal{C}$
connecting $x_{0}$ and $x_{1}.$

It is easy to verify that $\alpha (0)=\alpha (1)=0$ and that $\alpha
:[0,1]\rightarrow \Bbb{R}_{+}$ is a concave function---thus differentiable
a.e$.$ For each $k=1,2,...,$ define $y_{k}(\eta ):=y(\eta )+(1/k)z;$ then $%
y_{k}(\cdot )$ is a path in $\Bbb{R}^{m}\backslash \mathcal{C},$ the region
where $P$ is continuously differentiable. Let $\overline{\eta }\in (0,1)$ be
a point of differentiability of $\alpha (\cdot )$, thus also of $y(\cdot ),$ 
$y_{k}(\cdot )$ and $P(y_{k}(\cdot ));$ we have $dP(y_{k}(\overline{\eta }%
))/d\eta =\nabla P(y_{k}(\overline{\eta }))\cdot y_{k}^{\prime }(\overline{%
\eta })=\nabla P(y_{k}(\overline{\eta }))\cdot y^{\prime }(\overline{\eta }%
). $ By (D3), $\nabla P(y_{k}(\overline{\eta }))\cdot y_{k}(\overline{\eta }%
)>w(\nabla P(y_{k}(\overline{\eta })))\geq \nabla P(y_{k}(\overline{\eta }%
))\cdot y(\eta )$ for any $\eta \in [0,1]$ (the second inequality since $%
y(\eta )\in \mathcal{C}).$ Thus, for any accumulation point $q$ of the
bounded\footnote{%
Recall that $P$ is Lipschitz.} sequence $(\nabla P(y_{k}(\overline{\eta }%
)))_{k=1}^{\infty },$ we get $q\cdot y(\overline{\eta })\geq q\cdot y(\eta )$
for all $\eta \in [0,1].$ Therefore $q\cdot y(\eta )$ is maximized at $\eta =%
\overline{\eta },$ which implies that $q\cdot y^{\prime }(\overline{\eta }%
)=0.$ This holds for \emph{any} accumulation point $q$, hence $%
\lim_{k\rightarrow \infty }dP(y_{k}(\overline{\eta }))/d\eta =0$ for almost
every $\overline{\eta }.$ Therefore 
\begin{eqnarray*}
P(x_{1})-P(x_{0}) &=&P(y(1))-P(y(0))=\lim_{k}[P(y_{k}(1))-P(y_{k}(0))] \\
&=&\lim_{k}\int_{0}^{1}\left( dP(y_{k}(\eta ))/d\eta \right) d\eta
=\int_{0}^{1}\left( \lim_{k}dP(y_{k}(\eta ))/d\eta \right) d\eta =0,
\end{eqnarray*}
(again, $P$ is Lipschitz, so $dP(y_{k}(\eta ))/d\eta $ are uniformly
bounded).

\emph{Step 2:} If $\lambda _{1}=-\lambda _{0}$ and there is another boundary
point $x_{2}$ with outward unit normal $\lambda _{2}$ different from both $%
-\lambda _{0}$ and $-\lambda _{1},$ then we get paths on $\mathrm{bd}\ 
\mathcal{C}$ connecting $x_{0}$ to $x_{2}$ and $x_{1}$ to $x_{2},$ and also $%
P(x_{0})=P(x_{2})$ and $P(x_{1})=P(x_{2})$ ---thus we get the same
conclusion as in Step 1.

\emph{Step 3:} If $\lambda _{1}=-\lambda _{0}$ and no $x_{2}$ and $\lambda
_{2}$ as in Step 2 exist, it follows that the unit normal to every point on
the boundary of $\mathcal{C}$ is either $\lambda _{0}$ or $-\lambda _{0};$
thus $\mathcal{C}$ is the set bounded between the two parallel hyperplanes $%
\lambda _{0}\cdot x=w(\lambda _{0})$ and $-\lambda _{0}\cdot x=w(-\lambda
_{0}).$ In particular, the boundary of $\mathcal{C}$ is not connected, and
we are in case (c2). Note that in this case when $x_{0}$ and $x_{1}$ lie on
the same hyperplane then $P(x_{0})=P(x_{1})$ by Step 1 (since $\lambda _{1}=$
$\lambda _{0}\neq -\lambda _{0}).$

\emph{Step 4:} If it is case (c1)---thus not (c2)---then the situation of
Step 3 is not possible; thus for any two boundary points $x_{0}$ and $x_{1}$
we get $P(x_{0})=P(x_{1})$ by either Step 1 or Step 2.

\emph{Step 5: }Given $x\notin \mathcal{C},$ let $x_{0}\in \mathrm{bd}\ 
\mathcal{C}$ be the point in $\mathcal{C}$ that is closest to $x.$ Then the
line segment from $x$ to $x_{0}$ lies outside $\mathcal{C}$, i.e., $y(\eta
):=\eta x+(1-\eta )x_{0}\notin \mathcal{C}$ for all $\eta \in (0,1].$ By
(D3) and $x_{0}\in \mathcal{C}$, it follows that $\nabla P(y(\eta ))\cdot
y(\eta )>w(\nabla P(y(\eta )))\geq \nabla P(y(\eta ))\cdot x_{0},$ or, after
dividing by $\eta >0,$ that $\nabla P(y(\eta ))\cdot (x-x_{0})>0,$ for all $%
\eta \in (0,1].$ Hence $P(x)-P(x_{0})=\int_{0}^{1}\nabla P(y(\eta ))\cdot
y^{\prime }(\eta )\ d\eta =\int_{0}^{1}\nabla P(y(\eta ))\cdot (x-x_{0})\
d\eta >0,$ showing that $P(x)>c$ in case (c1) and $P(x)>c_{1}$ or $%
P(x)>c_{2} $ in case (c2).%
%TCIMACRO{\TeXButton{End Proof}{\endproof}}
%BeginExpansion
\endproof%
%EndExpansion

We can now prove the main result of this section.

\textbf{Proof of Theorem \ref{th-general}.} First, use Lemma \ref{boundary-C}
to replace $P$ by $P_{1}$ as follows: When the boundary of $\mathcal{C}$ is
connected (case (c1)), define $P_{1}(x):=(P(x)-c)^{2}$ for $x\notin \mathcal{%
C}$ and $P_{1}(x):=0$ for $x\in \mathcal{C}$; when the boundary of $\mathcal{%
C}$ is not connected (case (c2)), define $P_{1}(x):=(P(x)-c_{1})^{2}$ for $%
x\notin \mathcal{C}$ with $\lambda \cdot x>w(\lambda ),$ $%
P_{1}(x):=(P(x)-c_{2})^{2}$ for $x\notin \mathcal{C}$ with $(-\lambda )\cdot
x>w(-\lambda ),$ and $P_{1}(x):=0$ for $x\in \mathcal{C}.$ It is easy to
verify that: $P_{1}$ is continuously differentiable; $\nabla P_{1}(x)$ is
positively proportional to $\nabla P(x)$ and thus to $\Lambda (x)$ for $%
x\notin \mathcal{C};$ $P_{1}(x)\geq 0$ for all $x;$ and $P_{1}(x)=0$ if and
only if $x\in \mathcal{C}.$

Given $\varepsilon >0,$ let $k\geq 2$ be a large enough integer such that 
\begin{equation}
\frac{\nabla P_{1}(x)\cdot x-w(\nabla P_{1}(x))}{P_{1}(x)}\geq \frac{1}{k}
\label{eq-k}
\end{equation}
for all $x$ in the compact set $K\cap \{x:P_{1}(x)\geq \varepsilon \}$ (the
minimum of the above ratio is attained and it is positive by (D3))$.$ Put%
\footnote{%
We write $[z]_{+}$ for the positive part of $z,$ i.e., $[z]_{+}:=\max
\{z,0\}.$} $Q(x):=\left( [P_{1}(x)-\varepsilon ]_{+}\right) ^{k}.$ Then $Q$
is continuously differentiable (since $k\geq 2)$ and it satisfies all the
conditions of Lemma \ref{l-special_case}. To check (iii): When $Q(x)=0$ we
have $\nabla Q(x)=0,$ and when $Q(x)>0$ we have 
\begin{eqnarray*}
\nabla Q(x)\cdot x-w(\nabla Q(x)) &=&k(P_{1}(x)-\varepsilon )^{k-1}\left[
\nabla P_{1}(x)\cdot x-w(\nabla P_{1}(x))\right] \\
&\geq &(P_{1}(x)-\varepsilon )^{k-1}P_{1}(x) \\
&\geq &Q(x).
\end{eqnarray*}
(the first inequality follows from (\ref{eq-k}))$.$

By Lemma \ref{l-special_case}, it follows that the $\Lambda $-strategy
guarantees a.s. $\lim_{t\rightarrow \infty }Q(\overline{a}_{t})=0$, or $\lim
\sup_{t\rightarrow \infty }P_{1}(\overline{a}_{t})\leq \varepsilon .$ Since $%
\varepsilon >0$ is arbitrary, this yields a.s. $\lim_{t\rightarrow \infty
}P_{1}(\overline{a}_{t})=0,$ or $\overline{a}_{t}\rightarrow \mathcal{C}$.%
%TCIMACRO{\TeXButton{End Proof}{\endproof}}
%BeginExpansion
\endproof%
%EndExpansion

\textbf{Remark. }$P$ may be viewed (up to a constant, as in the definition
of $P_{1}$ above) as a generalized distance to the set $\mathcal{C}$
(compare with Remark \ref{rem-norm} in Subsection \ref{ss-appr-main}).

\subsection{Counterexamples\label{ss-appr-examples}}

In this subsection, we provide counterexamples showing the indispensability
of the conditions (D1)--(D3) for the validity of Theorem \ref{th-general}.
The first two examples refer to (D1), the third to (D2), and the last one to
(D3).\pagebreak

\begin{example}
\label{ex-d1-infinity}The role of (D1).
\end{example}

Consider the following $2$-dimensional vector payoff matrix $A$%
\[
\begin{tabular}{ccc}
& C1 & C2 \\ \cline{2-3}
R1 & \multicolumn{1}{|c}{$(0,-1)$} & \multicolumn{1}{|c|}{$(0,1)$} \\ 
\cline{2-3}
R2 & \multicolumn{1}{|c}{$(1,0)$} & \multicolumn{1}{|c|}{$(-1,0)$} \\ 
\cline{2-3}
\end{tabular}
\quad . 
\]
Let $i$ be the Row player and $-i$ the Column player. The set $\mathcal{C}:=%
\Bbb{R}_{-}^{2}$ is approachable by the Row player since $w(\lambda )<\infty 
$ whenever $\lambda \geq 0,$ and then the mixed action $\sigma
^{Row}(\lambda ):=\left( \lambda _{1}/(\lambda _{1}+\lambda _{2}),\lambda
_{2}/(\lambda _{1}+\lambda _{2})\right) $ of the Row player yields $\lambda
\cdot A(\sigma ^{Row}(\lambda ),\gamma )=0=w(\lambda )$ for any action $%
\gamma $ of the Column player.

We define a directional mapping $\Lambda _{\infty }$ on $\Bbb{R}%
^{2}\backslash \Bbb{R}_{-}^{2}$: 
\[
\Lambda _{\infty }(x):=\left\{ 
\begin{tabular}{cc}
$(1,0),$ & if $x_{1}>x_{2};$ \\ 
$(0,1),$ & if $x_{1}\leq x_{2}.$%
\end{tabular}
\right. 
\]
Clearly $\Lambda _{\infty }$ is \emph{not continuous}, i.e., it does not
satisfy (D1); it does however satisfy (D3) and (D2) (with $P(x)=\max
\{x_{1},x_{2}\},$ the $l_{\infty }$ potential; see Remark \ref{rem-norm} in
Subsection \ref{ss-appr-main}). Consider a $\Lambda _{\infty }$-strategy for
the Row player that, when $x:=\overline{a}_{t-1}\notin \mathcal{C}$, plays $%
\sigma ^{Row}(\Lambda _{\infty }(x))$ at time $t$; that is, he plays R1 when 
$x_{1}>x_{2},$ and R2 when $x_{1}\leq x_{2}.$ Assume that the Column player
plays\footnote{%
In order to show that the strategy of the Row player does not guarantee
approachability to $\mathcal{C},$ we exhibit one strategy of the Column
player for which $\overline{a}_{t}$ does not converge to $\mathcal{C}.$} C2
when $x_{1}>x_{2},$ and C1 when $x_{1}\leq x_{2}.$ Then, starting with, say, 
$a_{1}=(0,1)\notin \mathcal{C},$ the vector payoff $a_{t}$ will always be
either $(0,1)$ or $(1,0),$ thus on the line $x_{1}+x_{2}=1,$ so the average $%
\overline{a}_{t}$ does not converge to $\mathcal{C}=\Bbb{R}_{-}^{2}.$

\begin{example}
\label{ex-d1-1}The role of (D1), again.
\end{example}

The same as in Example \ref{ex-d1-infinity}, but now the directional mapping
is $\Lambda _{1},$ defined on $\Bbb{R}^{2}\backslash \Bbb{R}_{-}^{2}$ by 
\[
\Lambda _{1}(x):=\left\{ 
\begin{tabular}{cc}
$(1,1),$ & if $x_{1}>0$ and $x_{2}>0;$ \\ 
$(1,0),$ & if $x_{1}>0$ and $x_{2}\leq 0;$ \\ 
$(0,1),$ & if $x_{1}\leq 0$ and $x_{2}>0.$%
\end{tabular}
\right. 
\]
Again, the mapping $\Lambda _{1}$ is \emph{not continuous}---it does not
satisfy (D1)---but it satisfies (D3) and (D2) (with $%
P(x):=[x_{1}]_{+}+[x_{2}]_{+},$ the $l_{1}$ potential). Consider a $\Lambda
_{1}$-strategy for the Row player where at time $t$ he plays $\sigma
^{Row}(\Lambda _{1}(x))$ when $x:=\overline{a}_{t-1}\notin \mathcal{C}$, and
assume that the Column player plays C1 when $x_{1}\leq 0$ and $x_{2}>0,$ and
plays C2 otherwise. Thus, if $x\notin \mathcal{C}$ then $a_{t}$ is:

\begin{itemize}
\item  $(0,1)$ or $(-1,0)$ with equal probabilities, when $x_{1}>0$ and $%
x_{2}>0;$

\item  $(0,1),$ when $x_{1}>0$ and $x_{2}\leq 0;$

\item  $(1,0),$ when $x_{1}\leq 0$ and $x_{2}>0.$
\end{itemize}

\noindent In all cases the second coordinate of $a_{t}$ is non-negative;
therefore, if we start with, say, $a_{1}=(0,1)\notin \mathcal{C},$ then,
inductively, the second coordinate of $\overline{a}_{t-1}$ will be strictly
positive, so that $\overline{a}_{t-1}\notin \mathcal{C}$ for all $t.$ But
then $E[a_{t}|h_{t-1}]\in \mathcal{D}:=\mathrm{conv}\{(-1/2,1/2),(0,1),(1,0)%
\},$ and $\mathcal{D}$ is disjoint from $\mathcal{C}$ and at a positive
distance from it$.$ Therefore $(1/t)\sum_{\tau \leq t}E[a_{\tau }|h_{\tau
-1}]\in \mathcal{D}$ and so, by the Strong Law of Large Numbers, $\lim 
\overline{a}_{t}=\lim (1/t)\sum_{\tau \leq t}a_{\tau }\in \mathcal{D}$ too
(a.s.), so $\overline{a}_{t}$ does not approach\footnote{%
One way to see this formally is by a separation argument: Let $%
f(x):=x_{1}+3x_{2};$ then $E[f(a_{t})|h_{t-1}]\geq 1,$ so $\lim \inf f(%
\overline{a}_{t})=\lim \inf (1/t)\sum_{\tau \leq t}f(a_{t})\geq 1,$ whereas $%
f(x)\leq 0$ for all $x\in \mathcal{C}.$} $\mathcal{C}.$

To get some intuition, consider the deterministic system where $a_{t}$ is
replaced by $E[a_{t}|h_{t-1}].$ Then the point $(0,1/3)$ is a stationary
point for this dynamic. Specifically (see Figure \ref{fig-l1}%
%TCIMACRO{\TeXButton{BeginFigure}{\begin{figure}[htb] \centering}}
%BeginExpansion
\begin{figure}[htb] \centering%
%EndExpansion
%TCIMACRO{
%\TeXButton{Figure-l1}{ \unitlength 1mm
%
%\begin{picture}(115.00,128.00)(20.00,12.00)
%\linethickness{1.0pt}
% \put(130.00,70.00){\vector(1,0){0.2}}
%\put(20.00,70.00){\line(1,0){110.00}}
%  \put(70.00,130.00){\vector(0,1){0.2}}
%\put(70.00,20.00){\line(0,1){110.00}}
% 
%\linethickness{0.7pt}
%\put(110.00,70.00){\circle*{1.50}}
%\put(50.00,90.00){\circle*{1.50}}
%  \put(50.00,90.00){\line(3,-1){60.00}}
% \put(70.00,83.33){\circle*{1.50}}
% \put(64.00,87.00){\vector(3,-1){1.0}}
%\put(58.00,89.00){\line(3,-1){6.00}}
%  \put(87.00,79.20){\vector(-3,1){1.0}}
%\put(93.00,77.20){\line(-3,1){6.00}}
% \put(110.00,66.00){\makebox(0,0)[cc]{$A(R2,C1)$}}
%\put(72.00,85.00){\makebox(0,0)[lb]{$(0, {1 \over 3})$}}
%\put(48.00,94.00){\makebox(0,0)[cc]{$(- {1 \over 2} , {1 \over 2})$}}
%\put(132.00,70.00){\makebox(0,0)[lc]{$x_1$}}
%\put(70.00,132.00){\makebox(0,0)[cb]{$x_2$}}
%\put(70.00,110.00){\circle*{1.50}}
%\put(30.00,70.00){\circle*{1.50}}
%\put(70.00,30.00){\circle*{1.50}}
%\put(110.00,73.00){\makebox(0,0)[cb]{$(1,0)$}}
%\put(30.00,73.00){\makebox(0,0)[cb]{$(-1,0)$}}
%\put(68.00,110.00){\makebox(0,0)[rc]{$(0,1)$}}
%\put(68.00,30.00){\makebox(0,0)[rc]{$(0,-1)$}}
%
%\put(72.00,110.00){\makebox(0,0)[lc]{$A(R1,C2)$}}
%\put(30.00,66.00){\makebox(0,0)[cc]{$A(R2,C2)$}}
%\put(72.00,30.00){\makebox(0,0)[lc]{$A(R1,C1)$}}
%
%\put(61,59){\makebox(0,0)[cc]{$\mathcal{C}$}}
%\linethickness{0.4pt}
%\put(70,70){\line(-1,-3){12}}
%\put(70,40){\line(-1,-3){2}}
%\put(60,70){\line(-1,-3){16}}
%\put(50,70){\line(-1,-3){16}}
%\put(37.2,61.6){\line(-1,-3){13}}
%\put(27.2,61.6){\line(-1,-3){6}}
%
%\put(54,22){\line(1,3){1.5}}
%\put(64,22){\line(1,3){1.5}}
%
%\end{picture}
%}}
%BeginExpansion
 \unitlength 1mm

\begin{picture}(115.00,128.00)(20.00,12.00)
\linethickness{1.0pt}
 \put(130.00,70.00){\vector(1,0){0.2}}
\put(20.00,70.00){\line(1,0){110.00}}
  \put(70.00,130.00){\vector(0,1){0.2}}
\put(70.00,20.00){\line(0,1){110.00}}
 
\linethickness{0.7pt}
\put(110.00,70.00){\circle*{1.50}}
\put(50.00,90.00){\circle*{1.50}}
  \put(50.00,90.00){\line(3,-1){60.00}}
 \put(70.00,83.33){\circle*{1.50}}
 \put(64.00,87.00){\vector(3,-1){1.0}}
\put(58.00,89.00){\line(3,-1){6.00}}
  \put(87.00,79.20){\vector(-3,1){1.0}}
\put(93.00,77.20){\line(-3,1){6.00}}
 \put(110.00,66.00){\makebox(0,0)[cc]{$A(R2,C1)$}}
\put(72.00,85.00){\makebox(0,0)[lb]{$(0, {1 \over 3})$}}
\put(48.00,94.00){\makebox(0,0)[cc]{$(- {1 \over 2} , {1 \over 2})$}}
\put(132.00,70.00){\makebox(0,0)[lc]{$x_1$}}
\put(70.00,132.00){\makebox(0,0)[cb]{$x_2$}}
\put(70.00,110.00){\circle*{1.50}}
\put(30.00,70.00){\circle*{1.50}}
\put(70.00,30.00){\circle*{1.50}}
\put(110.00,73.00){\makebox(0,0)[cb]{$(1,0)$}}
\put(30.00,73.00){\makebox(0,0)[cb]{$(-1,0)$}}
\put(68.00,110.00){\makebox(0,0)[rc]{$(0,1)$}}
\put(68.00,30.00){\makebox(0,0)[rc]{$(0,-1)$}}

\put(72.00,110.00){\makebox(0,0)[lc]{$A(R1,C2)$}}
\put(30.00,66.00){\makebox(0,0)[cc]{$A(R2,C2)$}}
\put(72.00,30.00){\makebox(0,0)[lc]{$A(R1,C1)$}}

\put(61,59){\makebox(0,0)[cc]{$\mathcal{C}$}}
\linethickness{0.4pt}
\put(70,70){\line(-1,-3){12}}
\put(70,40){\line(-1,-3){2}}
\put(60,70){\line(-1,-3){16}}
\put(50,70){\line(-1,-3){16}}
\put(37.2,61.6){\line(-1,-3){13}}
\put(27.2,61.6){\line(-1,-3){6}}

\put(54,22){\line(1,3){1.5}}
\put(64,22){\line(1,3){1.5}}

\end{picture}
%
%EndExpansion
\caption{The deterministic dynamic in Example 2.4\label{fig-l1}}%
%TCIMACRO{\TeXButton{EndFigure}{\end{figure}}}
%BeginExpansion
\end{figure}%
%EndExpansion
), if $\overline{a}_{t-1}$ is on the line segment joining $(-1/2,1/2)$ with $%
(1,0),$ then $E[\overline{a}_{t}|h_{t-1}]$ will be there too, moving towards 
$(-1/2,1/2)$ when $\overline{a}_{t-1}$ is in the positive orthant and
towards $(1,0)$ when it is in the second orthant.

\begin{example}
\label{ex-no-d2}The role of (D2).
\end{example}

Consider the following $2$-dimensional vector payoff matrix $A$: 
\[
\begin{tabular}{ccccc}
& C1 & C2 & C3 & C4 \\ \cline{2-5}
R1 & \multicolumn{1}{|c}{$(0,1)$} & \multicolumn{1}{|c}{$(0,0)$} & 
\multicolumn{1}{|c}{$(0,-1)$} & \multicolumn{1}{|c|}{$(0,0)$} \\ \cline{2-5}
R2 & \multicolumn{1}{|c}{$(-1,0)$} & \multicolumn{1}{|c}{$(0,0)$} & 
\multicolumn{1}{|c}{$(1,0)$} & \multicolumn{1}{|c|}{$(0,0)$} \\ \cline{2-5}
R3 & \multicolumn{1}{|c}{$(0,0)$} & \multicolumn{1}{|c}{$(0,-1)$} & 
\multicolumn{1}{|c}{$(0,0)$} & \multicolumn{1}{|c|}{$(0,1)$} \\ \cline{2-5}
R4 & \multicolumn{1}{|c}{$(0,0)$} & \multicolumn{1}{|c}{$(-1,0)$} & 
\multicolumn{1}{|c}{$(0,0)$} & \multicolumn{1}{|c|}{$(1,0)$} \\ \cline{2-5}
\end{tabular}
\quad . 
\]
Again, the Row player is $i$ and the Column player is $-i.$ Let $\mathcal{C}%
:=\{(0,0)\}.$ For every $\lambda \in \Bbb{R}^{2}\backslash \{(0,0)\},$ put $%
\mu _{1}:=\left| \lambda _{1}\right| /(\left| \lambda _{1}\right| +\left|
\lambda _{2}\right| )$ and $\mu _{2}:=\left| \lambda _{2}\right| /(\left|
\lambda _{1}\right| +\left| \lambda _{2}\right| ),$ and define a mixed
action $\sigma ^{Row}(\lambda )$ for the Row player and a pure action $%
c(\lambda )$ for the Column player, as follows:

\begin{itemize}
\item  If $\lambda _{1}\geq 0$ and $\lambda _{2}\geq 0$ then $\sigma
^{Row}(\lambda ):=(\mu _{1},\mu _{2},0,0)$ and $c(\lambda ):=$ C1;

\item  If $\lambda _{1}<0$ and $\lambda _{2}\geq 0$ then $\sigma
^{Row}(\lambda ):=(0,0,\mu _{1},\mu _{2})$ and $c(\lambda ):=$ C2;

\item  If $\lambda _{1}<0$ and $\lambda _{2}<0$ then $\sigma ^{Row}(\lambda
):=(\mu _{1},\mu _{2},0,0)$ and $c(\lambda ):=$ C3; and

\item  If $\lambda _{1}\geq 0$ and $\lambda _{2}<0$ then $\sigma
^{Row}(\lambda ):=(0,0,\mu _{1},\mu _{2})$ and $c(\lambda ):=$ C4.
\end{itemize}

It is easy to verify that in all four cases:

\begin{enumerate}
\item[(1)]  $\lambda \cdot A(\sigma ^{Row}(\lambda ),\gamma )\leq
0=w(\lambda )$ for any action $\gamma $ of the Column player; and

\item[(2)]  $A(\sigma ^{Row}(\lambda ),c(\lambda ))=\left( \left| \lambda
_{1}\right| +\left| \lambda _{2}\right| \right) ^{-1}\widehat{\lambda }$,
where $\widehat{\lambda }:=(-\lambda _{2},\lambda _{1}).$
\end{enumerate}

\noindent Condition (1) implies, by (\ref{eq-w(lambda)}), that $\mathcal{C}$
is approachable by the Row player; and condition (2) means that $A(\sigma
^{Row}(\lambda ),c(\lambda ))$ is $90{{}^{\circ }}$ counterclockwise from $%
\lambda .$

Consider now the directional mapping $\Lambda $ given by $\Lambda
(x):=(x_{1}+\alpha x_{2},x_{2}-\alpha x_{1}),$ where $\alpha >0$ is a fixed
constant.\footnote{%
This will provide a counterexample for any value of $\alpha ,$ showing that
it is not an isolated phenomenon.} Then (D1) and (D3) hold (for the latter,
we have $x\cdot \Lambda (x)=\left( x_{1}\right) ^{2}+\left( x_{2}\right)
^{2}>0=w(\Lambda (x))$ for all $x\notin \mathcal{C}),$ but \emph{%
integrability} (D2)\emph{\ is not satisfied}. To see this, assume that $%
\nabla P(x)=\phi (x)\Lambda (x)$ for all $x\notin \mathcal{C},$ where $\phi
(x)>0$ is a continuous function. Consider the path $y(\eta ):=(\sin \eta
,\cos \eta )$ for $\eta \in [0,2\pi ].$ We have $0=P(y(2\pi
))-P(y(0))=\int_{0}^{2\pi }(dP(y(\eta ))/d\eta )d\eta =\int_{0}^{2\pi
}\nabla P(y(\eta ))\cdot y^{\prime }(\eta )d\eta .$ But the integrand equals 
$(\phi (y(\eta )))(\sin \eta +\alpha \cos \eta ,\cos \eta -\alpha \sin \eta
)\cdot (\cos \eta ,-\sin \eta )=\alpha \phi (y(\eta ))$ which is everywhere
positive, a contradiction.

We claim that if the Row player uses the $\Lambda $-strategy where at time $%
t $ he plays\footnote{%
For all $\lambda $ except those on the two axes (i.e., $\lambda _{1}=0$ or $%
\lambda _{2}=0),$ the mixed action $\sigma ^{Row}(\lambda )$ is uniquely
determined by (\ref{eq-directional-strategy}). If the Row player were to
play in these exceptional cases another mixed action satisfying (\ref
{eq-directional-strategy}), it is easy to verify that the Column player
could respond appropriately so that $\mathcal{C}$ is not approached. Thus,
no $\Lambda $-strategy guarantees approachability.} $\sigma ^{Row}(\Lambda (%
\overline{a}_{t-1})),$ and if the Column player chooses $c(\Lambda (%
\overline{a}_{t-1})),$ then the distance to the set $\mathcal{C}=\{(0,0)\}$
does not approach $0.$ Indeed, let $b_{t}:=E[a_{t}|h_{t-1}];$ then the
strategies played imply that the vector $b_{t}=$ $A(\sigma ^{Row}(\Lambda (%
\overline{a}_{t-1})),c(\Lambda (\overline{a}_{t-1})))$ is perpendicular to $%
\Lambda (\overline{a}_{t-1})$ and makes an acute angle with the vector $%
\overline{a}_{t-1}$. Specifically,\footnote{%
Writing $b$ for $b_{t};$ $x$ for $\overline{a}_{t-1};$ and $\lambda $ for $%
\Lambda (\overline{a}_{t-1})=\Lambda (x),$ we have: $x=(1+\alpha
^{2})^{-1}\left( \lambda +\alpha \widehat{\lambda }\right) $ (recall the
definition of $\Lambda $ and invert$),$ thus $b\cdot x=\left( \left| \lambda
_{1}\right| +\left| \lambda _{2}\right| \right) ^{-1}\widehat{\lambda }\cdot
(1+\alpha ^{2})^{-1}\left( \lambda +\alpha \widehat{\lambda }\right) =\alpha
\left( 1+\alpha ^{2}\right) ^{-1}\left( \left| \lambda _{1}\right| +\left|
\lambda _{2}\right| \right) ^{-1}\left\| \lambda \right\| ^{2}$ (we have
used (2), $\lambda \cdot \widehat{\lambda }=0$ and $\left\| \lambda \right\|
=\left\| \widehat{\lambda }\right\| $). Now $\left\| b\right\| =\left(
\left| \lambda _{1}\right| +\left| \lambda _{2}\right| \right) ^{-1}\left\|
\lambda \right\| $ and $\left\| x\right\| =(1+\alpha ^{2})^{-1/2}\left\|
\lambda \right\| ,$ thus completing the proof.} $b_{t}\cdot \overline{a}%
_{t-1}=\beta \left\| b_{t}\right\| \left\| \overline{a}_{t-1}\right\| ,$
where $\beta :=\alpha /\sqrt{1+\alpha ^{2}}\leq 1.$ Therefore 
\begin{eqnarray*}
\left( E[t\left\| \overline{a}_{t}\right\| |h_{t-1}]\right) ^{2} &\geq
&\left\| E[t\overline{a}_{t}|h_{t-1}]\right\| ^{2} \\
&=&(t-1)^{2}\left\| \overline{a}_{t-1}\right\| ^{2}+2(t-1)b_{t}\cdot 
\overline{a}_{t-1}+\left\| b_{t}\right\| ^{2} \\
&\geq &(t-1)^{2}\left\| \overline{a}_{t-1}\right\| ^{2}+2(t-1)\beta \left\|
b_{t}\right\| \left\| \overline{a}_{t-1}\right\| +\beta ^{2}\left\|
b_{t}\right\| ^{2} \\
&=&\left( (t-1)\left\| \overline{a}_{t-1}\right\| +\beta \left\|
b_{t}\right\| \right) ^{2}.
\end{eqnarray*}
Now $\left\| b_{t}\right\| \geq 1/\sqrt{2}$ (for instance, see (2)), and we
have thus obtained 
\[
E[t\left\| \overline{a}_{t}\right\| -(t-1)\left\| \overline{a}_{t-1}\right\|
|h_{t-1}]\geq \beta /\sqrt{2}>0, 
\]
from which it follows that $\lim \inf \left\| \overline{a}_{t}\right\| =\lim
\inf (1/t)\sum_{\tau \leq t}\left[ \tau \left\| \overline{a}_{\tau }\right\|
-(\tau -1)\left\| \overline{a}_{\tau -1}\right\| \right] \geq \beta /\sqrt{2}%
>0$ a.s., again by the Strong Law of Large Numbers. Thus the distance of the
average payoff vector $\overline{a}_{t}$ from the set $\mathcal{C}=\{(0,0)\}$
is, with probability one, bounded away from $0$ from some time on.

To get some intuition for this result, note that direction of movement from $%
\overline{a}_{t-1}$ to $E[\overline{a}_{t}|h_{t-1}]$ is at a fixed angle $%
\theta \in (0,\pi /2)$ from $\overline{a}_{t-1},$ which, if the dynamic was
deterministic, would generate a counterclockwise spiral that goes away from $%
(0,0).$

\begin{example}
\label{ex-not-d3}The role of (D3).
\end{example}

Consider the $2$-dimensional vector payoff matrix $A$%
\[
\begin{tabular}{c|c|}
\cline{2-2}
R1 & $(0,1)$ \\ \cline{2-2}
R2 & $(-1,0)$ \\ \cline{2-2}
\end{tabular}
\quad , 
\]
(where $i$ is the Row player, and $-i$ has one action). The set $\mathcal{C}%
:=\Bbb{R}_{-}^{2}$ is approachable by the Row player (by playing ``R2
forever''). Consider the directional mapping $\Lambda $ defined on $\Bbb{R}%
^{2}\backslash \Bbb{R}_{-}^{2}$ by $\Lambda (x):=(1,0).$ Then (D1) and (D2)
are satisfied (with $P(x):=x_{1}),$ but (D3) is not: $\Lambda (0,1)\cdot
(0,1)=0=w(\Lambda (0,1)).$ Playing ``R1 forever'' is a $\Lambda $-strategy,
but the payoff is $(0,1)\notin \mathcal{C}.$

\section{Regrets\label{s-regret}}

\subsection{Model and Preliminary Results\label{ss-regr-model}}

In this section we consider standard $N$-person games in strategic form
(with \emph{scalar} payoffs for each player). The set of players is a finite
set $N,$ the action set of each player $i$ is a finite set $S^{i},$ and the
payoff function of $i$ is $u^{i}:S\rightarrow \Bbb{R}$, where $S:=\Pi _{j\in
N}S^{j};$ we will denote this game $<N,(S^{i})_{i},(u^{i})_{i}>$ by $\Gamma
. $

As in the previous section, the game is played repeatedly in discrete time $%
t=1,2,...$ ; denote by $s_{t}^{i}\in S^{i}$ the choice of player $i$ at time 
$t,$ and put $s_{t}=(s_{t}^{i})_{i\in N}\in S.$ The payoff of $i$ in period $%
t$ is $U_{t}^{i}:=u^{i}(s_{t}),$ and $\overline{U}_{t}^{i}:=(1/t)\sum_{\tau
\leq t}U_{\tau }^{i}$ is his average payoff up to $t.$

Fix a player $i\in N.$ Following Hannan [1957], we consider the \emph{regrets%
} of player $i$, namely, for each one of his actions $k\in S^{i},$ the
change in his average payoff if he were always to choose $k$ (while no one
else makes any change in his realized actions): 
\[
D_{t}^{i}(k):=\frac{1}{t}\sum_{\tau =1}^{t}u^{i}(k,s_{\tau }^{-i})-\overline{%
U}_{t}^{i}=u^{i}(k,z_{t}^{-i})-\overline{U}_{t}^{i}, 
\]
where $z_{t}^{-i}\in \Delta (S^{-i})$ is the empirical distribution of the
actions chosen by the other players in the past.\footnote{%
I.e., $z_{t}^{-i}(s^{-i}):=\left| \{\tau \leq t:s_{\tau
}^{-i}=s^{-i}\}\right| /t$ for each $s^{-i}\in S^{-i}$, where we write $%
\left| \Omega \right| $ for the cardinality of the set $\Omega .$} A
strategy of player $i$ is called \emph{Hannan-consistent} if, as $t$
increases, all regrets are guaranteed---no matter what the other players
do---to become almost surely non-positive in the limit; that is, with
probability one, $\lim \sup_{t\rightarrow \infty }D_{t}^{i}(k)\leq 0$ for
all $k\in S^{i}.$

Following Hart and Mas-Colell [2000], it is useful to view the regrets of $i$
as an $m$-dimensional vector payoff$,$ where $m:=|S^{i}|$. We thus define $%
A\equiv A^{i}:S\rightarrow \Bbb{R}^{m},$ \emph{the }$i$\emph{-regret
vector-payoff game associated to }$\Gamma ,$ by 
\[
\begin{array}{rcl}
A_{k}(s^{i},s^{-i}) & := & u^{i}(k,s^{-i})-u^{i}(s^{i},s^{-i})\text{ for all 
}k\in S^{i},\text{ and} \\ 
A(s^{i},s^{-i}) & := & \left( A_{k}(s^{i},s^{-i})\right) _{k\in S^{i}},
\end{array}
\]
for all $s=(s^{i},s^{-i})\in S^{i}\times S^{-i}=S.$ Rewriting the regret as 
\[
D_{t}^{i}(k)=\frac{1}{t}\sum_{\tau \leq t}\left[ u^{i}(k,s_{\tau
}^{-i})-u^{i}(s_{\tau }^{i},s_{\tau }^{-i})\right] 
\]
shows that the vector of regrets at time $t$ is just the average of the $A$
vector payoffs in the first $t$ periods: $D_{t}^{i}=(1/t)\sum_{\tau \leq
t}A(s_{\tau }).$ The existence of a Hannan-consistent strategy in $\Gamma $
is thus equivalent to the approachability by player $i$ of the non-positive
orthant $\Bbb{R}_{-}^{S^{i}}$ in the vector-payoff game $A$, and a strategy
is Hannan-consistent if and only if it guarantees that $\Bbb{R}_{-}^{S^{i}}$
is approached.

We now present two important results that apply in all generality to the
regret setup.

\begin{proposition}
\label{prop-reg-1}For any (finite) $N$-person game $\Gamma ,$ the
non-positive orthant $\Bbb{R}_{-}^{S^{i}}$ is approachable by player $i$ in
the $i$-regret vector-payoff associated game.
\end{proposition}

This Proposition follows immediately from the next one. Observe that the
approachability of $\Bbb{R}_{-}^{S^{i}}$ is equivalent, by the Blackwell
condition (\ref{eq-w(lambda)}), to: For every $\lambda \in \Delta (S^{i})$
there exists $\sigma ^{i}(\lambda )\in \Delta (S^{i}),$ a mixed action of
player $i,$ such that 
\begin{equation}
\lambda \cdot A(\sigma ^{i}(\lambda ),s^{-i})\leq 0\text{ for all }s^{-i}\in
S^{-i}  \label{eq-blackwell_for_neg_orthant}
\end{equation}
(indeed, $w(\lambda )$ equals $0$ for $\lambda \geq 0$ and it is infinite
otherwise). That is, the expected regret obtained by playing $\sigma
^{i}(\lambda )$ lies in the half-space (through the origin) with normal $%
\lambda $. In this regret setup, the mixture $\sigma ^{i}(\lambda )$ may be
actually chosen in a simple manner:

\begin{proposition}
\label{prop-reg-2}For any (finite) $N$-person game $\Gamma $ and every $%
\lambda \in \Delta (S^{i}),$ condition (\ref{eq-blackwell_for_neg_orthant})
is satisfied by $\sigma ^{i}(\lambda )=\lambda .$
\end{proposition}

%TCIMACRO{\TeXButton{Proof}{\proof}}
%BeginExpansion
\proof%
%EndExpansion
Given $\lambda \in \Delta (S^{i}),$ a $\sigma ^{i}\equiv (\sigma
_{k}^{i})_{k\in S^{i}}\in \Delta (S^{i})$ satisfies (\ref
{eq-blackwell_for_neg_orthant}) if and only if 
\begin{equation}
\sum_{k\in S^{i}}\lambda _{k}\sum_{j\in S^{i}}\sigma
_{j}^{i}[u^{i}(k,s^{-i})-u^{i}(j,s^{-i})]\leq 0  \label{eq-4}
\end{equation}
for all $s^{-i}\in S^{-i}.$ This may be rewritten as 
\[
\sum_{k\in S^{i}}u^{i}(k,s^{-i})[\lambda _{k}\sum_{j\in S^{i}}\sigma
_{j}^{i}-\sigma _{k}^{i}\sum_{j\in S^{i}}\lambda _{j}]=\sum_{k\in
S^{i}}u^{i}(k,s^{-i})[\lambda _{k}-\sigma _{k}^{i}]\leq 0. 
\]
Therefore, by choosing $\sigma ^{i}$ so that all coefficients in the square
brackets vanish---that is, by choosing $\sigma _{k}^{i}=\lambda _{k}$---we
guarantee (\ref{eq-4}) and thus (\ref{eq-blackwell_for_neg_orthant}) for all 
$s^{-i}$.%
%TCIMACRO{\TeXButton{End Proof}{\endproof}}
%BeginExpansion
\endproof%
%EndExpansion

\subsection{Regret-Based Strategies\label{ss-regr-proc}}

The general theory of Section \ref{s-approachability} is now applied to the
regret situation. A \emph{stationary regret-based }strategy for player $i$
is a strategy of $i$ such that the choices depend only on $i$'s regret
vector; that is, for every history $h_{t-1},$ the mixed action of $i$ at
time $t$ is a function\footnote{%
Notice that the time $t$ does not matter: The strategy is ``stationary.''}
of $D_{t-1}^{i}$ only: $\sigma _{t}^{i}=\sigma ^{i}(D_{t-1}^{i})\in \Delta
(S^{i})$. The main result of this section is:

\begin{theorem}
\label{th-regret}Consider a stationary regret-based strategy of player $i$
given by a mapping $\sigma ^{i}:\Bbb{R}^{S^{i}}\rightarrow \Delta (S^{i})$
that satisfies:

\begin{description}
\item[(R1)]  There exists a continuously differentiable function $P:\Bbb{R}%
^{S^{i}}\rightarrow \Bbb{R}$ such that $\sigma ^{i}(x)$ is positively
proportional to $\nabla P(x)$ for every $x\notin \Bbb{R}_{-}^{S^{i}};$ and

\item[(R2)]  $\sigma ^{i}(x)\cdot x>0$ for every $x\notin \Bbb{R}%
_{-}^{S^{i}}.$
\end{description}

Then this strategy is Hannan-consistent for any (finite) $N$-person game.
\end{theorem}

%TCIMACRO{\TeXButton{Proof}{\proof}}
%BeginExpansion
\proof%
%EndExpansion
Apply Theorem \ref{th-general} for $\mathcal{C}=\Bbb{R}_{-}^{S^{i}}$
together with Propositions \ref{prop-reg-1} and \ref{prop-reg-2}: (D1) and
(D2) yield (R1), and (D3) yields (R2).%
%TCIMACRO{\TeXButton{End Proof}{\endproof}}
%BeginExpansion
\endproof%
%EndExpansion

We have thus obtained a wide class of strategies that are Hannan-consistent.
It is noteworthy that these are ``universal'' strategies: The mapping $%
\sigma ^{i}$ is independent of the game (see also the ``variable game'' case
in Section 5).

Condition (R2) says that when $D_{t-1}^{i}\notin \Bbb{R}_{-}^{S^{i}}$%
---i.e., when some regret is positive---the mixed choice $\sigma _{t}^{i}$
of $i$ satisfies $\sigma _{t}^{i}\cdot D_{t-1}^{i}>0.$ This is equivalent to 
\begin{equation}
u^{i}(\sigma _{t}^{i},z_{t-1}^{-i})>\overline{U}_{t-1}^{i}.
\label{eq-better}
\end{equation}
That is, the expected payoff of $i$ from playing $\sigma _{t}^{i}$ against
the empirical distribution $z_{t-1}^{-i}$ of the actions chosen by the other
players in the past, is higher than his realized average payoff. Thus $%
\sigma _{t}^{i}$ is a \emph{better-reply}, where ``better'' is relative to
the obtained payoff. By comparison, fictitious play always chooses an action
that is a \emph{best-reply} to the empirical distribution $z_{t-1}^{-i}$.
For more on this ``better vs. best'' issue, see Subsection \ref{ss-better}
below and Hart and Mas-Colell [2000, Section 4(e)].

We now describe a number of interesting special cases, in order of
increasing generality.

\begin{enumerate}
\item  $l_{2}$ \emph{potential}: $P(x)=\left( \sum_{k\in S^{i}}(\left[
x_{k}\right] _{+})^{2}\right) ^{1/2}.$ This yields (after normalization) $%
\Lambda (x)=(1/\left\| [x]_{+}\right\| _{1})[x]_{+}$ for $x\notin \Bbb{R}%
_{-}^{S^{i}},$ and the resulting strategy is $\sigma
_{t}^{i}(k)=[D_{t-1}^{i}(k)]_{+}/\sum_{k^{\prime }\in
S^{i}}[D_{t-1}^{i}(k^{\prime })]_{+}$ when $D_{t-1}^{i}\notin \Bbb{R}%
_{-}^{S^{i}}.$ This is the Hannan-consistent strategy introduced in Hart and
Mas-Colell [2000, Theorem C], where the play probabilities are proportional
to the positive regrets.

\item  $l_{p}$ \emph{potential: }$P(x)=\left( \sum_{k\in S^{i}}(\left[
x_{k}\right] _{+})^{p}\right) ^{1/p}$ for some $1<p<\infty ,$ which yields $%
\sigma _{t}^{i}(k)=([D_{t-1}^{i}(k)]_{+})^{p-1}/\sum_{k^{\prime }\in
S^{i}}([D_{t-1}^{i}(k^{\prime })]_{+})^{p-1};$ i.e., play probabilities that
are proportional to a fixed positive power ($p-1>0)$ of the positive regrets.

\item  \emph{Separable potential: }A \emph{separable }strategy is one where $%
\sigma _{t}^{i}$ is proportional to a vector whose $k$-th coordinate depends 
\emph{only} on the $k$-th regret; i.e., $\sigma _{t}^{i}$ is proportional to
a vector of the form $(\psi _{k}(D_{t-1}^{i}(k)))_{k\in S^{i}}.$ Conditions
(R1) and (R2) result in the following requirements\footnote{%
Consider points $x$ with $x_{j}=\pm \varepsilon $ for all $j\neq k.$}: For
each $k$ in $S^{i},$ the function $\psi _{k}:\Bbb{R}\rightarrow \Bbb{R}$ is
continuous; $\psi _{k}(x_{k})>0$ for $x_{k}>0;$ and $\psi _{k}(x_{k})=0$ for 
$x_{k}\leq 0.$ The corresponding potential is $P(x)=\sum_{k\in S^{i}}\Psi
_{k}(x_{k}),$ where $\Psi _{k}(x):=\int_{-\infty }^{x}\psi _{k}(y)dy$. Note
that, unlike the previous two cases, the functions $\psi _{k}$ may differ
for different $k,$ and they need not be monotonic (thus a higher regret may
not lead to a higher probability).
\end{enumerate}

Finally, observe that in all the above cases, actions with negative or zero
regret are never chosen. This need no longer be true in the general
(non-separable) case; see Subsection \ref{ss-better} below.

\subsection{Counterexamples\label{ss-regr-examples}}

The counterexamples of Subsection \ref{ss-appr-examples} translate easily
into the regret setup.

\begin{itemize}
\item  \emph{The role of ``better'' (R2)}. Consider the 1-person game 
\[
\begin{tabular}{c|c|}
\cline{2-2}
R1 & $0$ \\ \cline{2-2}
R2 & $1$ \\ \cline{2-2}
\end{tabular}
\quad . 
\]

The resulting regret game is given in Example \ref{ex-not-d3}. The strategy
of playing ``R1 forever'' satisfies condition (R1) but not condition (R2)
(or (3.3)), and it is indeed not Hannan-consistent.

\item  \emph{The role of continuity in (R1)}. Consider the simplest 2-person
coordination game (a well-known stumbling block for many strategies). 
\[
\begin{tabular}{ccc}
& C1 & C2 \\ \cline{2-3}
R1 & \multicolumn{1}{|c}{$1,1$} & \multicolumn{1}{|c|}{$0,0$} \\ \cline{2-3}
R2 & \multicolumn{1}{|c}{$0,0$} & \multicolumn{1}{|c|}{$1,1$} \\ \cline{2-3}
\end{tabular}
\quad . 
\]
The resulting regret game for the Row player is precisely the vector-payoff
game of Examples \ref{ex-d1-infinity} and \ref{ex-d1-1}, where we looked at
the approachability question for the non-positive orthant. The two
strategies we considered there---which we have shown not to be
Hannan-consistent---are not continuous. They correspond to the $l_{\infty }$
and the $l_{1}$ potentials, respectively, which are not differentiable.
(Note in particular that the $l_{\infty }$ case yields ``fictitious play,''
which is further discussed in Subsection \ref{ss-regr-fict_play} below.)

\item  \emph{The role of integrability in (R1)}. The vector-payoff game of
our Example \ref{ex-no-d2} can easily be seen to be a regret game. However,
the approachable set there was not the non-positive orthant. In order to get
a counterexample to the result of Theorem \ref{th-regret} when integrability
is not satisfied, one would need to resort to additional dimensions, that
is, more than 2 strategies; we do not do it here, although it is plain that
such examples are easy---though painful---to construct.
\end{itemize}

\section{Fictitious Play and Better Play\label{s-better-fict}}

\subsection{Fictitious Play and Smooth Fictitious Play\label%
{ss-regr-fict_play}}

As we have already pointed out, fictitious play may be viewed as a
stationary regret-based strategy, corresponding to the $l_{\infty }$ mapping
(the directional mapping generated by the $l_{\infty }$ potential). It does
not guarantee Hannan-consistency (see Example \ref{ex-d1-infinity} and
Subsection \ref{ss-regr-examples}); the culprit for this is the lack of
continuity (i.e., (D1)).

Before continuing the discussion it is useful to note a property of
fictitious play: \emph{The play at time }$t$ \emph{does not depend on the
realized average payoff }$\overline{U}_{t-1}^{i}$. Indeed, $%
\max_{k}D_{t-1}^{i}(k)=\max_{k}u^{i}(k,z_{t-1}^{-i})-\overline{U}_{t-1}^{i},$
so an action $k\in S^{i}$ maximizes regret if and only if it maximizes the
payoff against the empirical distribution $z_{t-1}^{-i}$ of the actions of $%
-i.$ In the general approachability setup of Section \ref{s-approachability}
(with $\mathcal{C}=\Bbb{R}_{-}^{S^{i}})$, this observation translates into
the requirement that the directional mapping $\Lambda $ be invariant to
adding the same constant to all coordinates. That is, writing $%
e:=(1,1,...,1)\in \Bbb{R}^{S^{i}},$%
\begin{equation}
\Lambda (x)=\Lambda (y)\text{ for any }x,y\notin \Bbb{R}_{-}^{S^{i}}\text{
with }x-y=\alpha e\text{ for some scalar }\alpha .  \label{eq-e}
\end{equation}
Note that, as it should be, the $l_{\infty }$ mapping satisfies this
property (\ref{eq-e}).

\begin{proposition}
A directional mapping $\Lambda $ satisfies (D2), (D3) and (\ref{eq-e}) for $%
\mathcal{C}=\Bbb{R}_{-}^{m}$ if and only if it is equivalent to the $%
l_{\infty }$ mapping, i.e., its potential $P$ satisfies $P(x)=\phi
(\max_{k}x_{k})$ for some strictly increasing function $\phi .$
\end{proposition}

%TCIMACRO{\TeXButton{Proof}{\proof}}
%BeginExpansion
\proof%
%EndExpansion
Since $\mathcal{C}=\Bbb{R}_{-}^{m},$ the allowable directions are $\lambda
\geq 0,\lambda \neq 0.$ Thus $\nabla P(x)\geq 0$ for a.e. $x\notin \Bbb{R}%
_{-}^{m}$ by (D2)$,$ implying that the limit of $\nabla P(x)\cdot x$ is $%
\leq 0$ as $x$ approaches the boundary of\textrm{\ }$\Bbb{R}_{-}^{m}.$ But $%
\nabla P(x)\cdot x>0$ for a.e. $x\notin \Bbb{R}_{-}^{m}$ by (D3), implying
that the limit of $\nabla P(x)\cdot x$ is in fact $0$ as $x$ approaches $%
\mathrm{bd\ }\Bbb{R}_{-}^{m}.$ Because $P$ is Lipschitz, it follows that $P$
is constant on $\mathrm{bd\ }\Bbb{R}_{-}^{m},$ i.e., $P(x)=P(0)$ for every $%
x\in \mathrm{bd\ }\Bbb{R}_{-}^{m}.$ By (D3) again we have $P(x)>P(0)$ for
all $x\notin \Bbb{R}_{-}^{m}.$ Adding to this the invariance condition (\ref
{eq-e}) implies that the level sets of $P$ are all translates by multiples
of $e$ of $\mathrm{bd~}\Bbb{R}_{-}^{m}=\{x\in \Bbb{R}^{m}:\max_{k}x_{k}=0\}.$%
%TCIMACRO{\TeXButton{End Proof}{\endproof}}
%BeginExpansion
\endproof%
%EndExpansion

Since the $l_{\infty }$ mapping does not guarantee that $\mathcal{C}=\Bbb{R}%
_{-}^{m}$ is approached (again, see Example \ref{ex-d1-infinity} and
Subsection \ref{ss-regr-examples}), we have:

\begin{corollary}
There is no stationary regret-based strategy that satisfies (R1) and (R2)
and is independent of realized average payoff.
\end{corollary}

The import of the Corollary (together with the indispensability of
conditions (R1) and (R2), as shown by the counterexamples in Subsection \ref
{ss-regr-examples}) is that one cannot simultaneously have independence of
realized payoffs and guarantee Hannan-consistency in every game.

We must weaken one of the two properties. One possibility is to weaken the
Hannan consistency requirement to $\varepsilon $-consistency: $\lim
\sup_{t}D_{t}^{i}(k)\leq \varepsilon $ for all $k$. Fudenberg and Levine
[1995] propose a smoothing of fictitious play that---like fictitious play
itself---is independent of realized payoffs. In essence, their function $P$
is convex, smooth, and satisfies the property that its level sets are
obtained from each other by translations along the $e=(1,...,1)$ direction
(see Figure \ref{fig-fud-lev}%
%TCIMACRO{\TeXButton{BeginFigure}{\begin{figure}[htb] \centering}}
%BeginExpansion
\begin{figure}[htb] \centering%
%EndExpansion
%TCIMACRO{
%\TeXButton{Figure-fud-lev}{ \unitlength 1mm
%
%\begin{picture}(105.00,124.00)(10,2)
%\linethickness{1.0pt}
%
%\put(60.00,10.00){\vector(0,1){110}}
%\put(10.00,60.00){\vector(1,0){110.00}}
%
%\put(12,64.6){\makebox(0,0)[lc]{$P=P(0)$}}
%\put(12,82.5){\makebox(0,0)[lc]{$P=c_1$}}
%\put(12,102.5){\makebox(0,0)[lc]{$P=c_2$}}
%\put(47.5,50){\makebox(0,0)[lc]{$\mathcal{C}$}}
%
%\put(7,61){\makebox(0,0)[cc]{$\varepsilon \{ $}}
%
%\linethickness{0.7pt}
%   \bezier{500}(72.50,80),(79.5,79.5),(79.9,72.50)
%\bezier{500}(92.5,100),(99.5,99.5),(99.9,92.5)
%\bezier{500}(54.6,62.1),(61.6,61.6),(62.0,54.6)
%
%  \put(10.00,80.00){\line(1,0){62.50}}
%  \put(80.00,10.00){\line(0,1){62.50}}
%  \put(10,62.1){\line(1,0){45}}
%  \put(62.1,10){\line(0,1){45}}
%  \put(10.00,100.00){\line(1,0){82.50}}
%  \put(100.00,10.00){\line(0,1){82.50}}
% 
%\linethickness{0.4pt}
%\multiput(60.00,60.00)(0.12,0.12){17}{\line(0,1){0.12}}
%  \multiput(64.00,64.00)(0.12,0.12){17}{\line(0,1){0.12}}
%  \multiput(68.00,68.00)(0.12,0.12){17}{\line(0,1){0.12}}
%  \put(70.00,70.00){\line(0,1){0.00}}
%  \put(70.00,70.00){\line(0,1){0.00}}
%  \put(70.00,70.00){\line(0,1){0.00}}
%  \multiput(72.00,72.00)(0.12,0.12){17}{\line(0,1){0.12}}
%  \multiput(84.00,84.00)(0.12,0.12){17}{\line(0,1){0.12}}
%  \multiput(76.00,76.00)(0.12,0.12){17}{\line(0,1){0.12}}
%  \multiput(88.00,88.00)(0.12,0.12){17}{\line(0,1){0.12}}
%  \multiput(80.00,80.00)(0.12,0.12){17}{\line(0,1){0.12}}
%  \multiput(92.00,92.00)(0.12,0.12){17}{\line(0,1){0.12}}
%  \multiput(96.00,96.00)(0.12,0.12){17}{\line(0,1){0.12}}
%  \multiput(100.00,100.00)(0.12,0.12){17}{\line(0,1){0.12}}
%  \multiput(104.00,104.00)(0.12,0.12){17}{\line(0,1){0.12}}
%
%\put(10,10){\line(1,3){17.37}}
%
%\put(20,10){\line(1,3){17.37}}
%\put(30,10){\line(1,3){17.37}}
%\put(40,10){\line(1,3){17.2}}
%\put(50,10){\line(1,3){12.1}}
%\put(60,10){\line(1,3){2.1}}
%\put(10,40){\line(1,3){7.37}}
%  \end{picture}
%}}
%BeginExpansion
 \unitlength 1mm

\begin{picture}(105.00,124.00)(10,2)
\linethickness{1.0pt}

\put(60.00,10.00){\vector(0,1){110}}
\put(10.00,60.00){\vector(1,0){110.00}}

\put(12,64.6){\makebox(0,0)[lc]{$P=P(0)$}}
\put(12,82.5){\makebox(0,0)[lc]{$P=c_1$}}
\put(12,102.5){\makebox(0,0)[lc]{$P=c_2$}}
\put(47.5,50){\makebox(0,0)[lc]{$\mathcal{C}$}}

\put(7,61){\makebox(0,0)[cc]{$\varepsilon \{ $}}

\linethickness{0.7pt}
   \bezier{500}(72.50,80),(79.5,79.5),(79.9,72.50)
\bezier{500}(92.5,100),(99.5,99.5),(99.9,92.5)
\bezier{500}(54.6,62.1),(61.6,61.6),(62.0,54.6)

  \put(10.00,80.00){\line(1,0){62.50}}
  \put(80.00,10.00){\line(0,1){62.50}}
  \put(10,62.1){\line(1,0){45}}
  \put(62.1,10){\line(0,1){45}}
  \put(10.00,100.00){\line(1,0){82.50}}
  \put(100.00,10.00){\line(0,1){82.50}}
 
\linethickness{0.4pt}
\multiput(60.00,60.00)(0.12,0.12){17}{\line(0,1){0.12}}
  \multiput(64.00,64.00)(0.12,0.12){17}{\line(0,1){0.12}}
  \multiput(68.00,68.00)(0.12,0.12){17}{\line(0,1){0.12}}
  \put(70.00,70.00){\line(0,1){0.00}}
  \put(70.00,70.00){\line(0,1){0.00}}
  \put(70.00,70.00){\line(0,1){0.00}}
  \multiput(72.00,72.00)(0.12,0.12){17}{\line(0,1){0.12}}
  \multiput(84.00,84.00)(0.12,0.12){17}{\line(0,1){0.12}}
  \multiput(76.00,76.00)(0.12,0.12){17}{\line(0,1){0.12}}
  \multiput(88.00,88.00)(0.12,0.12){17}{\line(0,1){0.12}}
  \multiput(80.00,80.00)(0.12,0.12){17}{\line(0,1){0.12}}
  \multiput(92.00,92.00)(0.12,0.12){17}{\line(0,1){0.12}}
  \multiput(96.00,96.00)(0.12,0.12){17}{\line(0,1){0.12}}
  \multiput(100.00,100.00)(0.12,0.12){17}{\line(0,1){0.12}}
  \multiput(104.00,104.00)(0.12,0.12){17}{\line(0,1){0.12}}

\put(10,10){\line(1,3){17.37}}

\put(20,10){\line(1,3){17.37}}
\put(30,10){\line(1,3){17.37}}
\put(40,10){\line(1,3){17.2}}
\put(50,10){\line(1,3){12.1}}
\put(60,10){\line(1,3){2.1}}
\put(10,40){\line(1,3){7.37}}
  \end{picture}
%
%EndExpansion
\caption{The level sets of the potential of smooth fictitious
play\label{fig-fud-lev}}%
%TCIMACRO{\TeXButton{EndFigure}{\end{figure}}}
%BeginExpansion
\end{figure}%
%EndExpansion
).\footnote{%
Specifically, $P(x)=\max_{\sigma ^{i}\in \Delta (S^{i})}\{\sigma ^{i}\cdot
x+\varepsilon v(\sigma ^{i})\},$ where $\varepsilon >0$ is small and $v$ is
a strictly differentiably concave function, with gradient vector approaching
infinite length as one approaches the boundary of $\Delta (S^{i})$.} The
level set of $P$ through $0$ is therefore smooth; it is very close to the
boundary of the negative orthant but unavoidably distinct from it. The
resulting strategy approaches $\mathcal{C}=\{x:P(x)\leq P(0)\}$ (recall
Remark \ref{rem-convex-P} in Subsection \ref{ss-appr-main}: a set of the
form $\{x:P(x)\leq c\},$ for constant $c,$ is approachable when $c\geq P(0)$%
---since it contains $\Bbb{R}_{-}^{S^{i}}$---and is not approachable when $%
c<P(0)$---since it does not contain $0$). The set $\mathcal{C}$ is strictly
larger than $\Bbb{R}_{-}^{S^{i}};$ it is an $\varepsilon $-neighborhood of
the negative orthant $\Bbb{R}_{--}^{S^{i}}.$ Thus one obtains only $%
\varepsilon $-consistency.\footnote{%
Other smoothings have been proposed, including Hannan [1957], Foster and
Vohra [1993, 1998] and Freund and Schapire [1999] (in the later---which
corresponds to exponential smoothing---the strategy is non-stationary, i.e.,
it depends not only on the point in regret space but also on the time $t;$
of course, non-stationary strategies where $\varepsilon $ decreases with $t$
may yield exact consistency).}$^{,}$\footnote{%
Smooth fictitious play may be equivalently viewed, in our framework, as
first taking a set $\mathcal{C}$ that is close to the negative orthant and
has smooth boundary, and then using the $l_{\infty }$ distance from $%
\mathcal{C}$ as a potential (recall Remark \ref{rem-norm} in Subsection \ref
{ss-appr-main}).}

The other possibility is to allow the strategy to depend also on the
realized payoffs. Then there are strategies that are close to fictitious
play and guarantee Hannan-consistency in any game. Take, for instance, the $%
l_{p}$ potential strategy for large enough $p$ (see Subsection \ref
{ss-regr-proc}).\footnote{%
This amounts to smoothing the norm and keeping $\mathcal{C}$ equal to the
negative orthant, whereas the previous construction smoothed the boundary of 
$\mathcal{C}$ and kept the $l_{\infty }$ norm. These are the two ``dual''
ways of generating a smooth potential (again, see Remark \ref{rem-norm} in
Subsection \ref{ss-appr-main}).}

\subsection{Better Play\label{ss-better}}

All the examples presented up to now satisfy an additional natural
requirement, namely, that only actions with positive regret are played
(provided, of course, that there are such actions). Formally, consider a
stationary regret-based strategy of player $i$ that is given by a mapping $%
\sigma ^{i}:\Bbb{R}^{S^{i}}\rightarrow \Delta (S^{i})$ (see Theorem \ref
{th-regret}); we add to (R1) and (R2) the following condition:\footnote{%
Note that the condition needs to be satisfied not only for $x\notin \Bbb{R}%
_{-}^{S^{i}},$ but also for $x\in \mathrm{bd\ }\Bbb{R}_{-}^{S^{i}}.$}

\begin{description}
\item[(R3)]  For every $x\notin \Bbb{R}_{--}^{S^{i}},$ if $x_{k}<0$ then $%
[\sigma ^{i}\left( x\right) ]_{k}=0.$
\end{description}

\noindent Since $x$ is the $i$-regret vector, (R3) means that $\sigma ^{i}$
gives probability $1$ to the set of actions with non-negative regret (unless
all regrets are negative, in which case there is no requirement\footnote{%
See Footnote \ref{ftnote30} below.}). This may be rewritten as 
\begin{equation}
\lbrack \sigma _{t}^{i}]_{k}>0\text{ only if }u^{i}(k,z_{t-1}^{-i})\geq 
\overline{U}_{t-1}^{i}.  \label{eq-r3}
\end{equation}
That is, only those actions $k$ are played whose payoff against the
empirical distribution $z_{t-1}^{-i}$ of the opponents' actions is at least
as large as the actual realized average payoff $\overline{U}_{t-1}^{i}$; in
short, only the ``better actions.''\footnote{%
Observe that (R2) (or (\ref{eq-better})) is a requirement on the average
over all played actions $k,$ whereas (R3) (or (\ref{eq-r3})) applies to each
such $k$ separately.} For an example where (R3) is \emph{not} satisfied, see
Figure \ref{fig-r3}%
%TCIMACRO{\TeXButton{BeginFigure}{\begin{figure}[htb] \centering}}
%BeginExpansion
\begin{figure}[htb] \centering%
%EndExpansion
%TCIMACRO{
%\TeXButton{Figure-R3}{ \unitlength 1mm
%
%\begin{picture}(90.00,118.00)(25,12)
%\linethickness{1.0pt}
%\put(60.00,20.00){\vector(0,1){100.00}}
%\put(20.00,60.00){\vector(1,0){100.00}}
%
%\linethickness{0.6pt}
%
%\bezier{480}(20.00,80.00)(80.00,80.00)(80.00,20.00)
%\bezier{556}(20.00,110.00)(85.00,85.00)(110.00,20.00)
%
%\put(50.00,75.00){\circle*{1.50}}
%\put(50.00,75.00){\vector(1,2){6.00}}
%\put(47.00,71.00){\makebox(0,0)[cc]{$x$}}
%\put(50.00,87.00){\makebox(0,0)[cc]{$\Lambda (x)$}}
%
%\linethickness{0.4pt}
%
%\put(60.00,60.00){\line(-1,-3){13.00}}
%\put(45.00,60.00){\line(-1,-3){13.00}}
%\put(30.00,60.00){\line(-1,-3){10.00}}
%\put(47.00,44.00){\makebox(0,0)[cc]{$\mathcal{C}$}}
%\put(20.00,83.00){\makebox(0,0)[lc]{$P=c_1$}}
%\put(20.00,113.00){\makebox(0,0)[lc]{$P=c_2$}}
%\end{picture}
%}}
%BeginExpansion
 \unitlength 1mm

\begin{picture}(90.00,118.00)(25,12)
\linethickness{1.0pt}
\put(60.00,20.00){\vector(0,1){100.00}}
\put(20.00,60.00){\vector(1,0){100.00}}

\linethickness{0.6pt}

\bezier{480}(20.00,80.00)(80.00,80.00)(80.00,20.00)
\bezier{556}(20.00,110.00)(85.00,85.00)(110.00,20.00)

\put(50.00,75.00){\circle*{1.50}}
\put(50.00,75.00){\vector(1,2){6.00}}
\put(47.00,71.00){\makebox(0,0)[cc]{$x$}}
\put(50.00,87.00){\makebox(0,0)[cc]{$\Lambda (x)$}}

\linethickness{0.4pt}

\put(60.00,60.00){\line(-1,-3){13.00}}
\put(45.00,60.00){\line(-1,-3){13.00}}
\put(30.00,60.00){\line(-1,-3){10.00}}
\put(47.00,44.00){\makebox(0,0)[cc]{$\mathcal{C}$}}
\put(20.00,83.00){\makebox(0,0)[lc]{$P=c_1$}}
\put(20.00,113.00){\makebox(0,0)[lc]{$P=c_2$}}
\end{picture}
%
%EndExpansion
\caption{(R3) is not satisfied\label{fig-r3}}%
%TCIMACRO{\TeXButton{EndFigure}{\end{figure}}}
%BeginExpansion
\end{figure}%
%EndExpansion
.

The $l_{p}$ potential strategies, for $1<p<\infty ,$ and in fact all
separable strategies (see Subsection \ref{ss-regr-proc}) essentially%
\footnote{%
The condition in (R3) is automatically satisfied in these cases for $x\notin 
\Bbb{R}^{S^{i}};$ one needs to impose it explicitly for $x\in \mathrm{bd\ }%
\Bbb{R}_{-}^{S^{i}}$ (where, up to now, we had no requirements).} satisfy
(R3). Fictitious play (with the $l_{\infty }$ potential) also satisfies
(R3): The action chosen is a ``best'' one (rather than just ``better''). At
the other extreme, the $l_{1}$ potential strategy gives equal probability to 
\emph{all} better actions, so it also satisfies (R3). (However, these last
two do not satisfy (R1)).

Using condition (R3) yields the following result, which is a generalization
of Theorem A of Monderer, Samet and Sela [1997] for fictitious play:

\begin{proposition}
\label{p-max-regret}Consider a stationary regret-based strategy of player $i$
given by a mapping $\sigma ^{i}:\Bbb{R}^{S^{i}}\rightarrow \Delta (S^{i})$
that satisfies\footnote{%
Note that (R1) and (R2) are not assumed.} (R3). Then, in any (finite) $N$%
-person game, the maximal regret of $i$ is always non-negative: 
\[
\max_{k\in S^{i}}D_{t}^{i}(k)\geq 0\text{ for all }t. 
\]
\end{proposition}

%TCIMACRO{\TeXButton{Proof}{\proof}}
%BeginExpansion
\proof%
%EndExpansion
By induction, starting with $D_{0}^{i}(k)=0$ for all $k.$ Assume that $%
\max_{k}D_{t-1}^{i}(k)\geq 0$ (or, $D_{t-1}^{i}\notin \Bbb{R}_{--}^{S^{i}}).$
By (R3), $D_{t-1}^{i}(k)\geq 0$ for any $k$ chosen at time $t;$ since $%
A_{k}(k,s_{t}^{-i})=0$ it follows that $%
D_{t}^{i}(k)=(1/t)((t-1)D_{t-1}^{i}(k)+t\;0)\geq 0.$%
%TCIMACRO{\TeXButton{End Proof}{\endproof}}
%BeginExpansion
\endproof%
%EndExpansion
\bigskip

Thus, the vector of regrets never enters the negative orthant.\footnote{%
\label{ftnote30}Therefore at every period there are always actions with
non-negative regret---out of which the next action is chosen (and so the
condition $x\notin \Bbb{R}_{--}^{S^{i}}$ in (R3) always holds).} Recall that
the result of Theorem \ref{th-regret} is that the vector of regrets
approaches the non-positive orthant. To combine the two, we define \emph{%
better play }as any stationary regret-based strategy of player $i$ that is
given by a mapping $\sigma ^{i}$ satisfying (R1)--(R3)\emph{. }We thus have:

\begin{corollary}
\label{cor-max-regret=0}In any (finite) $N$-person game, if player $i$ uses
a better play strategy, then his maximal regret converges to $0$ a.s. 
\[
\lim_{t\rightarrow \infty }\max_{k\in S^{i}}D_{t}^{i}(k)=\lim_{t\rightarrow
\infty }\left( \max_{k\in S^{i}}u^{i}(k,z_{t}^{-i})-\overline{U}%
_{t}^{i}\right) =0\text{ a.s.} 
\]
\end{corollary}

That is, the average payoff $\overline{U}_{t}^{i}$ of player $i$ up to time $%
t$ is close, as $t\rightarrow \infty ,$ to $i$'s best-reply payoff against
the empirical distribution of the other players' actions. In particular, in
a two-person zero-sum game we obtain the following:

\begin{corollary}
In any (finite) two-person zero-sum game, if both players use better play
strategies, then:

\begin{description}
\item[(i)]  For each player, the empirical distribution of play converges to
the set of optimal actions.\footnote{%
That is, the set of mixed actions that guarantee the value.}

\item[(ii)]  The average payoff converges to the value of the game.
\end{description}
\end{corollary}

%TCIMACRO{\TeXButton{Proof}{\proof}}
%BeginExpansion
\proof%
%EndExpansion
Let $1$ be the maximizer and $2$ the minimizer, and denote by $v$ the
minimax value of the game. Then $\max_{k\in S^{1}}u^{1}(k,z_{t}^{2})\geq v,$
so by Corollary \ref{cor-max-regret=0} we have $\lim \inf_{t}\overline{U}%
_{t}^{1}\geq v.$ The same argument for player 2 yields the opposite
inequality, thus $\lim_{t}\overline{U}_{t}^{1}=v$. Therefore $%
\lim_{t}\max_{k\in S^{1}}u^{1}(k,z_{t}^{2})=v$ (apply the Corollary again),
hence any limit point of the sequence $z_{t}^{2}$ must be an optimal action
of player $2;$ similarly for player $1.$%
%TCIMACRO{\TeXButton{End Proof}{\endproof}}
%BeginExpansion
\endproof%
%EndExpansion
\bigskip

Thus, better play enjoys the same properties as fictitious play in
two-person zero-sum games (for fictitious play, see Robinson [1951] for the
convergence to the set of optimal strategies, and see Monderer, Samet and
Sela [1997, Theorem B] and Rivi\`{e}re [1997] for the convergence of the
average payoff).

\section{Discussion and Extensions}

In this section we discuss a number of extensions of our results.

\begin{enumerate}
\item[1.]  \emph{Conditional regrets}
\end{enumerate}

As stated in the Introduction, we have been led to the ``no regret''
Hannan-consistency property from considerations of ``no conditional regret''
that correspond to correlated equilibria (see Hart and Mas-Colell [2000]).
Given two actions $k$ and $j$ of player $i,$ the \emph{conditional regret}
from $j$ to $k$ is the change that would have occurred in the average payoff
of $i$ if he had played action $k$ in all those periods where he did play $j$
(and everything else is left unchanged). That is, 
\[
DC_{t}^{i}(j,k):=\frac{1}{t}\sum_{\tau \leq t:s_{\tau }^{i}=j}\left[
u^{i}(k,s_{\tau }^{-i})-u^{i}\left( s_{\tau }\right) \right] 
\]
for every\footnote{%
Note that each Hannan regret $D_{t}^{i}\left( k\right) $ is the sum of the
conditional regrets $DC_{t}^{i}(j,k)$ over $j\neq k$. Thus the set of
distributions of $N$-tuples of actions that satisfy the Hannan no-regret
conditions includes the set of correlated equilibrium distributions. The
inclusion is, in general, strict (the two sets coincide when every player
has only two actions).} $j,k\in S^{i}.$ The vector of regrets $DC_{t}^{i}$
is now in $\Bbb{R}^{L},$ where $L:=S^{i}\times S^{i}$, and the empirical
distribution of actions up to time $t$ constitutes a correlated equilibrium
if and only if $DC_{t}^{i}\leq 0$. Thus, the set to be approached is the
non-positive orthant $\Bbb{R}_{-}^{L}.$ The corresponding game with vector
payoffs $A$ is defined as follows: The $(j,k)$ coordinate of the vector
payoff $A(s^{i},s^{-i})\in \Bbb{R}^{L}$ is $u^{i}(k,s^{-i})-u^{i}(j,s^{-i})$
when $s^{i}=j,$ and it is $0$ otherwise; hence $DC_{t}^{i}=(1/t)\sum_{\tau
\leq t}A(s_{\tau }).$

As in Propositions \ref{prop-reg-1} and \ref{prop-reg-2} (see Hart and
Mas-Colell [2000, Section 3]), it can easily be verified that:

\begin{itemize}
\item  $\mathcal{C}=\Bbb{R}_{-}^{L}$ is always approachable.

\item  For every $\lambda \in \Bbb{R}_{+}^{L},$ the Blackwell
approachability condition for $\mathcal{C}=\Bbb{R}_{-}^{L}$ ((\ref
{eq-w(lambda)}) or (\ref{eq-blackwell_for_neg_orthant})) holds for any mixed
action $\sigma ^{i}=(\sigma _{k}^{i})_{k\in S^{i}}\in \Delta (S^{i})$ that
satisfies 
\begin{equation}
\sum_{j\in S^{i}}\sigma _{j}^{i}\lambda (j,k)=\sigma _{k}^{i}\sum_{j\in
S^{i}}\lambda (k,j)\text{ for all }k\in S^{i}.  \label{eq-eigenvector}
\end{equation}
Viewing $\lambda $ as an $S^{i}\times S^{i}$ matrix, condition (\ref
{eq-eigenvector}) says that $\sigma ^{i}$ is an invariant vector for the
(non-negative) matrix $\lambda $.

\item  For every $\lambda \in \Bbb{R}_{+}^{L},$ there exists a $\sigma
^{i}\in \Delta (S^{i})$ satisfying (\ref{eq-eigenvector}).
\end{itemize}

Applying Theorem \ref{th-general} yields a large class of strategies. For
example (as in Subsection \ref{ss-regr-proc}), if $P$ is the $l_{p}$
potential for some $1<p<\infty $, then $\sigma ^{i}$ is an invariant vector
of the matrix of the $p-1$ powers of the non-negative regrets.\footnote{%
For $p=2$ we get the matrix of regrets---which yields precisely Theorem A of
Hart and Mas-Colell [2000].} In the more general separable case, $\sigma
^{i} $ is an invariant vector of the matrix whose $(j,k)$ coordinate is $%
\psi _{(j,k)}(DC_{t-1}^{i}(j,k)),$ where $\psi _{(j,k)}$ is any real
continuous function which vanishes for $x\leq 0$ and is positive for $x>0$ .
As in Hart and Mas-Colell [2000, Theorem A], if every player uses a strategy
in this class (of course, different players may use different types of
strategies), then the empirical distribution of play converges to the set of
correlated equilibria of $\Gamma .$

Since finding invariant vectors is by no means a simple matter, in Hart and
Mas-Colell [2000] much effort is devoted to obtaining simple adaptive
procedures, which use the matrix of regrets as a one-step transition matrix.
To do the same here, one would use instead the matrix $\Lambda \left(
DC_{t-1}^{i}\right) $.

\begin{enumerate}
\item[2.]  \emph{Variable game}%
%TCIMACRO{\TeXButton{No Page Break}{\nopagebreak}}
%BeginExpansion
\nopagebreak%
%EndExpansion
\end{enumerate}

We noted in Subsection \ref{ss-regr-proc} that our strategies are
game-independent. This allows us to consider the case where, at each period,
a different game is being played (for example, a stochastic game). The
strategy set of player $i$ is the same set $S^{i}$ in all games, but he does
not know which game is currently being played. All our results---in
particular, Theorem \ref{th-regret}---continue to hold\footnote{%
Assuming the payoffs of $i$ are uniformly bounded.} provided player $i$ is
told, \emph{after} each period $t$, which game was played at time $t$ and
what were the chosen actions $s_{t}^{-i}$ of the other players. Indeed, as
in Section \ref{s-regret}, $i$ can then compute the vector $%
a:=A(s_{t}^{i},s_{t}^{-i})\in \Bbb{R}^{S^{i}},$ update his regret vector: $%
D_{t}^{i}=(1/t)((t-1)D_{t-1}^{i}+a),$ and then play $\sigma ^{i}(D_{t}^{i})$
in the next period, where $\sigma ^{i}$ is any mapping satisfying (R1) and
(R2).

\begin{enumerate}
\item[3.]  \emph{Unknown game}
\end{enumerate}

When the player does not know the (fixed) game $\Gamma $ that is played and
is told, at each stage, only his own realized payoff (but not the choices of
the other players; this may be referred to as a ``stimulus-response''
model), Hannan-consistency may nonetheless be obtained (see Foster and Vohra
[1993, 1998], Auer \emph{et al.} [1995], Fudenberg and Levine [1998, Section
4.8], Hart and Mas-Colell [2000, Section 4(j)], and also Ba\~{n}os [1968]
and Megiddo [1980] for related work). For instance, one can replace the
regrets---which cannot be computed here---by appropriate estimates.\footnote{%
Specifically, use $\widehat{D}_{t}^{i}(k):=(1/t)\sum_{\tau \leq t:s_{\tau
}^{i}=k}(1/[\sigma _{\tau }^{i}]_{k})u^{i}(s_{\tau })-\overline{U}_{t}^{i}$
instead of $D_{t}^{i}(k)$ (see Hart and Mas-Colell [2000]).}

\begin{thebibliography}{99}
\bibitem{}  Auer P., N. Cesa-Bianchi, Y. Freund and R. E. Schapire [1995],
Gambling in a Rigged Casino: The Adversarial Multi-Armed Bandit Problem, 
\emph{Proceedings of the 36th Annual Symposium on Foundations of Computer
Science}, 322--331.

\bibitem{}  Ba\~{n}os, A. [1968], On Pseudo-Games, \emph{The Annals of
Mathematical Statistics} 39, 1932--1945.

\bibitem{}  Blackwell, D. [1956a], An Analog of the Minmax Theorem for
Vector Payoffs, \emph{Pacific Journal of Mathematics} 6, 1--8.

\bibitem{}  Blackwell, D. [1956b], Controlled Random Walks, \emph{%
Proceedings of the International Congress of Mathematicians} 1954, Vol. III,
North-Holland, 335--338.

\bibitem{}  Borodin, A. and R. El-Yaniv [1998], \emph{Online Computation and
Competitve Analysis}, Cambridge University Press.

\bibitem{}  Clarke, F. [1983], \emph{Optimization and Nonsmooth Analysis},
Wiley.

\bibitem{}  Foster, D. and R. V. Vohra [1993], A Randomized Rule for
Selecting Forecasts, \emph{Operations Research} 41, 704--709.

\bibitem{}  Foster, D. and R. V. Vohra [1997], Calibrated Learning and
Correlated Equilibrium, \emph{Games and Economic Behavior} 21, 40--55.

\bibitem{}  Foster, D. and R. V. Vohra [1998], Asymptotic Calibration, \emph{%
Biometrika} 85, 379--390.

\bibitem{}  Foster, D. and R. V. Vohra [1999], Regret in the On-Line
Decision Problem, \emph{Games and Economic Behavior} 29, 7--35.

\bibitem{}  Freund, Y. and R. E. Schapire [1999], Adaptive Game Playing
Using Multiplicative Weights, \emph{Games and Economic Behavior} 29, 79--103.

\bibitem{}  Fudenberg, D. and D. K. Levine [1995], Universal Consistency and
Cautious Fictitious Play, \emph{Journal of Economic Dynamics and Control}
19, 1065--1090.

\bibitem{}  Fudenberg, D. and D. K. Levine [1998], \emph{Theory of Learning
in Games}, MIT Press.

\bibitem{}  Fudenberg, D. and D. K. Levine [1999], Conditional Universal
Consistency, \emph{Games and Economic Behavior} 29, 104--130.

\bibitem{}  Hannan, J. [1957], Approximation to Bayes Risk in Repeated Play,
in \emph{Contributions to the Theory of Games}, Vol. III (\emph{Annals of
Mathematics Studies} 39), M. Dresher, A. W. Tucker and P. Wolfe (eds.),
Princeton University Press, 97--139.

\bibitem{}  Hart, S. and A. Mas-Colell [2000], A Simple Adaptive Procedure
Leading to Correlated Equilibrium, \emph{Econometrica}.

\bibitem{}  Littlestone, N. and M. K. Warmuth [1994], The Weighted Majority
Algorithm, \emph{Information and Computation} 108, 212--261.

\bibitem{}  Lo\`{e}ve, M. [1978], \emph{Probability Theory}, Vol. II, 4th
Edition, Springer-Verlag.

\bibitem{}  Luce, R. D. and H. Raiffa [1957], \emph{Games and Decisions},
Wiley.

\bibitem{}  Megiddo, N. [1980], On Repeated Games with Incomplete
Information Played by Non-Bayesian Players, \emph{International Journal of
Game Theory} 9, 157--167.

\bibitem{}  Monderer, D., D. Samet and A. Sela [1997], Belief Affirming in
Learning Processes, \emph{Journal of Economic Theory} 73, 438--452.

\bibitem{}  Rivi\`{e}re, P. [1997], Quelques Mod\`{e}les de Jeux
d'Evolution, Ph.D. thesis, Universit\'{e} Paris 6.

\bibitem{}  Robinson, J. [1951], An Iterative Method of Solving a Game, 
\emph{Annals of Mathematics} 54, 296--301.
\end{thebibliography}

\end{document}