%Paper: ewp-game/9401001
%From: Eitan Altman <Eitan.Altman@sophia.inria.fr>
%Date: Tue, 25 Jan 1994 14:24:50 +0100

\documentstyle[11pt]{article}
\setlength{\unitlength}{1mm}
%\font\petcap=amcsc10
\setlength {\topmargin}{0in}
\setlength {\textheight}{9.0truein}
\setlength {\oddsidemargin}{0in}
\setlength {\evensidemargin}{0in}
\setlength {\textwidth}{6.5in}
% \setlength {\parindent}{0pt}
\parskip10pt
\footheight 0.15truein
\footskip 0.6truein
\headheight 0.0truein
\headsep 0.0truein
\newcommand{\ls}[1]
   {\dimen0=\fontdimen6\the\font 
    \lineskip=#1\dimen0
    \advance\lineskip.5\fontdimen5\the\font
    \advance\lineskip-\dimen0
    \lineskiplimit=.9\lineskip
    \baselineskip=\lineskip
    \advance\baselineskip\dimen0
    \normallineskip\lineskip
    \normallineskiplimit\lineskiplimit
    \normalbaselineskip\baselineskip
    \ignorespaces
   }
\newcommand {\beq} {\begin{equation}}
\newcommand {\eeq} {\end{equation}}
\newcommand {\bear} {\begin{eqnarray}} 
\newcommand {\eear} {\end{eqnarray}} 
\newcommand {\done} {\quad\vrule height4pt WIDTH4PT}
\newcommand {\phantomI} {\vrule height20pt WIDTH0PT}
\newcommand {\barr} {\begin{array}} 
\newcommand {\earr} {\end{array}} 
\newcommand {\bearn}{\begin{eqnarray*}}
\newcommand {\eearn}{\end{eqnarray*}}
\newcommand {\N} {\rm I\!N}
\newcommand {\R} {\rm I\!R}
\def\limi{\mathop{\underline{\rm lim}}}
\def\lims{\mathop{\overline{\rm lim}}}
\def\sp{\newline\noindent}
\def\sqr#1#2{{\vcenter{\hrule height.#2pt
      \hbox{\vrule width.#2pt height#1pt \kern#1pt
        \vrule width.#2pt}
    \hrule height.2pt}}}
\def\Prf{\noindent{\bf Proof. }}
\def\fA{{\bf A}}
\def\K{{\cal K}}
\def\fB{{\bf B}}
\def\fI{{\bf I}}
\def\fX{{\bf X}}
\def\IC{{\bf IC}}
\def\dfn{ \stackrel{\rm def}{=} }

\newtheorem{definition}{Definition}[section]
\newtheorem{lemma}{Lemma}[section]
\newtheorem{corollary}{Corollary}[section]
\newtheorem{proposition}{Proposition}[section]
\newtheorem{theorem}{Theorem}[section]
\newtheorem{remark}{Remark}[section]
\ls{1.5}

\begin{document}
\title{\bf APPROXIMATIONS IN DYNAMIC ZERO-SUM GAMES}

\author
{ Mabel M. TIDBALL \ \ 
% \thanks{This work was partially done during
%         a visit of the author at INRIA - Sophia Antipolis, France}\\
% Facultad de Ciencias Exactas, Ingenier  a y Agrimensura\\
%  Universidad Nacional de Rosario\\
% Pellegrini 250,  (2000) Rosario\\
% Argentina 
and \ \ 
Eitan ALTMAN \\
        INRIA, Centre Sophia-Antipolis \\
	2004 Route des Lucioles, B.P.93 \\
	06902 Sophia-Antipolis Cedex \\
	France }
\date{Summer  1993}

\maketitle
\begin{abstract}
We develop a unifying approach
for approximating a ``limit" zero-sum game by a sequence of approximating
games.
We discuss both the convergence of the values and the convergence of
optimal (or ``almost" optimal) strategies. Moreover, based on optimal 
policies for the limit game, we construct policies which are almost optimal for
the approximating games. We then apply the general framework to 
state approximations of stochastic games, to convergence
of finite horizon problems to infinite horizon problems, to convergence
in the discount factor and in the immediate reward.
\end{abstract}
\vspace{1cm}
{\bf AMS Subject Classification: 90D05, 93E05}\\
{\bf Key words:} Zero-sum games, Approximations, Stochastic games.
%
\section{Introduction}
In many cases, one encounters dynamic games for which
time and space are continuous, and possibly unbounded.
In general, numerical solution of such games involve
discretization both in time and in space.
In pursuit evasion games in particular (see
Bardi et al. \cite{BFS} and Pourtallier and Tidball \cite{PTid}), and in 
differential games in general (see Pourtallier and Tolwinsky,
\cite{PT}, Tidball and Gonz\'alez \cite{TG}),
the time and space discretization
often lead to dynamic programming that has a stochastic game
interpretation. The numerical solution then typically requires
a finite state approximation. 

Approximations in dynamic games has therefore been an
active area of research for several decades. 
Several schemes for discretization of time and space and for
approximations have been developed for differential games 
\cite{BFS,BS,PTid,PT,TG}. In stochastic games, much attention
was devoted to approximations of infinite horizon problems
by (long) finite horizon ones, e.g. \cite{gillette,nowak3,S,vdv};
discretization of the action and state space have further been
considered by Whitt \cite{whitt1,whitt2}. Except for \cite{whitt1},
all the above references consider approximation of the value 
function of the games. 

The aim of this paper is to study in a systematic way 
approximations in games, not only of the values but also
for the policies. We begin by developing a general
framework for establishing the convergence of the upper
and lower values of a sequence of games $G_n$, $n=1,2,...$,
to a value $R$ (which we assume that exists)
of a limit game $G_\infty$. We are further interested in
the following questions: (i) do (almost) optimal policies converge
(in some sense)? (ii) Assume that $u$ and $v$ are (almost) optimal for some
approximating game $G_n$ (where $n$ is large enough in some sense). 
Can we construct from these almost optimal policies for the 
limit game? (iii) 
Assume that $u$ and $v$ are (almost) optimal for the
limit game. Can we use them to construct almost optimal 
policies for the approximating game $G_n$ for $n$ large enough?

Problem (ii) above arizes in the following situation. Suppose
that two players use some approximating numerical schemes to
obtain ``good" policies, e.g. time discretization. 
Each player might be using a different discretization scheme.
Yet, each player would like to ensure that regardless of the discretization
scheme used by the other player, he or she can guarantee 
some value, which would be ``almost" the value of the non-discretized game.
In fact, since the real game that is played is the non-discretized one,
the desired discretization should perform well even if
the adversary uses an optimal policy for the non-discretized game.

Problem (iii), on the other hand, arizes in the opposite situation,
and this serves as an additional motivation for studying approximations
in games. There are many example of dynamic games where one can
solve easier an infinite limiting game, where problems related
to the boundaries are avoided.
Indeed, examples are given in Section \ref{sec:ss-ap} of stochastic games
with (large) finite state space for which the natural approach
for constructing almost optimal policies is 
to solve a limit game with a countable state space.

After establishing the general theory for approximations in Section
\ref{sec:key}, we apply it to several approximation problems 
in discrete time stochastic games with discounted reward and
denumerable state space. Applications to other dynamic games
are the subject of future research. 
The basic model of the stochastic games is presented
in Section \ref{sec:prob}. We then present three schemes for
state approximation in Section \ref{sec:app} for the case of
infinite horizon. This generalizes many
results on the convergence of the optimal value in Markov Decision Processes
(i.e. stochastic games with a single player), e.g. 
\cite{CCb,HLa,HL,white,W}. Other related work on finite state
approximations in Markov Decision Processes are \cite{me1,me2,TS}.
In Section \ref{sec:fh} 
we extend the results on state approximations for infinite horizon
to the case of finite horizon, by a transformation of the state space. 
In Section \ref{sec:safsa} we study the convergence of the
finite horizon problem to the infinite horizon one, and we
combine state approximations with approximation of the horizon.
In Section \ref{sec:disc-re} we study the stability of
stochastic games in the discount factor and in the immediate reward.
Applications of approximation methods developed in this paper
are presented in Section \ref{sec:ss-ap}, and
some generalizations are finally discussed in Section \ref{sec:gen}.

\section{Key Theorems for approximations}
\label{sec:key}

We consider the following sequence $ G_n = 
(S_n, U_n , V_n ) \ n=1,2,...,\infty$
of generic zero-sum games
where $U_n$ is the set of strategies of player one 
and $V_n$ is the set of strategies of player two for the
$n$th game.
We assume that both $U_n $ and $V_n $ are endowed with some topology.
$S_n :U_n  \times V_n  \to \R $ is a measurable function for all $n$.
We define the upper (lower) value of the game:
\[
\overline{R_n} = \inf_{v \in V_n} \sup_{u \in U_n}  S_n(u,v) \quad
\left (\underline{R_n} = \sup_{u \in U_n} \inf_{v \in V_n} S_n(u,v)
\right )
\]
$G=(S,U,V) \dfn (S_{\infty}, U_\infty ,  V_\infty )$ will be 
called the limit game. It will be assumed that it has a value
$R \dfn R_\infty$.

An example where $G_n$ does not have a value but $G$ does
have it  will be given 
in Subsection \ref{0000} for computing almost optimal 
stationary policies for stochastic games with long (but finite) 
horizon.

A strategy $u^* \in U_n $ is said to be $\epsilon$-optimal
for player one in game $n$ if
\begin{equation}
\label{uop}
\inf_{v \in V_n} S_n (u^* , v ) 
\geq \inf_{v \in V_n } S_n (u , v ) - \epsilon
\qquad \forall u \in U_n
\end{equation} 
which is equivalent to 
$ \inf_{v \in V_n} S_n (u^* , v )  \geq \underline R_n - \epsilon $.
It is said to be strongly $\epsilon$-optimal
for player one in game $n$ if it satisfies
\[
\inf_{v \in V_n} S_n (u^* , v )  \geq \overline R_n - \epsilon 
\]
A strategy $v^* \in V_n $ is said to be $\epsilon$-optimal
for player two in game $n$ if
\begin{equation}
\label{vop}
\sup_{u \in U_n } S_n (u , v^* ) 
\leq \sup_{u \in U_n } S_n (u , v ) + \epsilon
\qquad \forall v \in V_n 
\end{equation}
which is equivalent to 
$ \sup_{u \in U_n } S_n (u , v^* )  \leq \overline R_n + \epsilon $.
It is said to be strongly $\epsilon$-optimal if
\[
\sup_{u \in U_n } S_n (u , v^* )  \leq \underline R_n + \epsilon 
\]
Note that strongly $\epsilon$-optimality implies $\epsilon$-optimality.
If a game has a value $\underline R_n = \overline R_n $ then
strongly $\epsilon$-optimality is equivalent to $\epsilon$-optimality.

Assume that $(S_n , U_n , V_n)$ converge (in some sense) to $(S , U,V) $.
We are interested in the following questions: \\
(Q1) Convergence of the values: does $\underline R_n$ (or $\overline R_n$)
converge to $R$? \\
(Q2) Convergence of policies: Fix some $\epsilon \geq 0 $.
Let $\epsilon_n$ be a sequence of positive real numbers
such that $\lims_{n \to \infty} \epsilon_n \leq \epsilon $.
Assume that $u_n^*$ and $v_n^*$ 
are $\epsilon_n$-optimal policies for the $n$th game.
Are $u_n^*$ and $v_n^*$ ``almost" optimal for the limit game,
for all $n$ large enough? \\
(Q3)
Let $\overline u \in U$ (resp. $\overline v \in V$) be some limit point 
of $u_n^*$ (resp. $v_n^*$), defined above.
Is $\overline u$ (resp. $\overline v$) $\epsilon$-optimal for the 
limit game? \\
(Q4) Robustness of the optimal policy:
If $u^*$ (resp. $v^*$) is an $ \epsilon $-optimal for the limit game,
can we derive of it an ``almost" (strongly) optimal policy
for the $n$th approximating game, for all $n$ large enough?

In most applications that we discuss in this paper 
$U_n = U$, $V_n = V$ do not depend on $n$. However,
in several applications this is not the case, e.g.
approximations in pursuit evasion games, see Bernhard and Shinar \cite{BS}. 
Another example is given for a state approximation 
scheme for solving stochastic games, see Subsection \ref{asIII}.

\begin{theorem}
\label{thm1}
Assume that there exists a sequence of functions, $\pi_n^1 : U_n \to U $,
$ \pi_n^2 : V_n \to V $, $\sigma_n^1 : U \to U_n $, 
$\sigma_n^2 : V \to V_n $, $n=1,2,...$ such that \\
({\bf A1}) $ \lims_{n \to \infty} 
[ S_n (u,\sigma_n^2 (v)) - S (\pi_n^1(u),v)]
\leq 0 $ 
uniformly in $u \in U_n$ for each $v \in V$.  \\
({\bf A2}) $ \limi_{n \to \infty} 
[ S_n (\sigma_n^1(u),v) - S (u,\pi_n^2(v)) ]
\geq 0 $ 
uniformly in $v \in V_n $ for each $u \in U $.  \\
Then \\
(1) 
$\lim_{n \to \infty} \underline 
R_n = \lim_{n \to \infty} \overline R_n = R $.  \\
(2) For any $\epsilon' > \epsilon$, there exists $N$ such that
$\pi_n^1 (u_n^*)$ (resp. $\pi_n^2 (v_n^*) $, see definitions in (Q2)) is 
$\epsilon'$-optimal for the limit game, for all $n \geq N$.  \\
(3) Let $u^*$ (resp. $v^*$) be $ \epsilon $-optimal for the limit game.
Then for all $\epsilon' > \epsilon $,
there exists $N (\epsilon' ) $ such that $\sigma_n^1 (u^*)$ 
(resp. $\sigma_n^2 (v^*)$) is strongly $\epsilon^{'}$-optimal 
for the $n$th approximating game, for all $n \geq N (\epsilon') $.  \\
(4) Suppose \\
({\bf A3}) $S(u,v)$ is a lower semicontinuous function in u, \\
({\bf A4}) $S(u,v)$ is an upper semicontinuous function in v.  \\
Suppose $\bar{u} \in U$ (resp. $\bar{v} \in V$) is a limit point of 
$\pi_n^1(u_n^*)$ (resp. $\pi_n^2 (v_n^*)$).
Then $\bar{u}$ (resp. $\bar{v}$) is $\epsilon$-optimal for the
limit game.  \\
\end{theorem}
\begin{remark}
Part (1) of Theorem \ref{thm1} is a generalization of
Lemma 1 in \cite{gillette}
\end{remark}
\Prf
(1) Choose $\epsilon > 0$.
Let $u^* \in U$ and $v^* \in V$ be $\epsilon$-optimal for the limit game $S$,
and choose some sequence $\epsilon(n)$ such that 
$\lim_{n \to \infty} \epsilon(n) = 0 $.
Let $u_n \in U_n $ be an $\epsilon(n)$-best response
to the policy $\sigma_n^2 (v^* ) $ in game $G_n$ (i.e.,
$S_n (u_n , \sigma_n^2 ( v^*) ) \geq S_n( u , \sigma_n^2 ( v^* ) ) 
- \epsilon(n)$ for all $u \in U_n $). 
Similarly, let $v_n \in V_n$ be an $\epsilon(n)$-best response
to the policy $\sigma_n^1 ( u^* )$ in game $G_n$.  \\
Choose some $\delta>0$. By (A1), there exists
$N$ such that for all $n \geq N$ and $u \in U_n $,
$S_n (u, \sigma_n^2 ( v^*) ) - S (\pi_n^1 ( u ),v^*) < 
\delta $.  Then for all $n \geq N$ 
\bearn
\overline R_n - R 
%
& = & \inf_{v \in V_n } \sup_{u \in U_n } S_n ( u_n  , v_n  ) - 
\inf_{v \in V} \sup_{u \in U} S ( u , v ) \\
%
& \leq & \sup_{u \in U_n } S_n ( u , \sigma_n^2(v^*) ) - 
\sup_{u \in U} S( u , v^* ) + \epsilon  \\ 
%
& \leq & S_n ( u_n , \sigma_n^2(v^*) ) - 
S( \pi_n^1(u_n) , v^* ) + \epsilon + \epsilon(n) \\ 
%
& \leq & \delta + \epsilon + \epsilon(n) 
\eearn
hence $\lims_{n \to \infty} \overline R_n \leq R + \delta + 
\epsilon $.\\
Similarly, by (A2), 
one shows that $R \leq \limi_{n \to \infty} \underline 
R_n + \delta + \epsilon $. Since $\underline R_n \leq \overline R_n$,
this implies that
$ \lims_{n \to \infty} | R - \underline R_n | \leq \delta + \epsilon $ and
$ \lims_{n \to \infty} | R - \overline R_n | \leq \delta + \epsilon. $
The result follows since $\epsilon$ and $\delta$ can be chosen 
arbitrarily small.  \\
(2) Fix some $ \delta > 0 $.
By (A1), as $u_n^*$ is $\epsilon(n)$-optimal for $G_n$, and by part (1),
there exists $N(\epsilon, \delta)$ such that for $n > N(\epsilon, \delta)$ 
we have
\beq
\label{eps}
\forall v \in V: \ \ 
S_n (u_n^* , \sigma_n^2 ( v )) - S(u_n^* , \pi_n^1 (v)) < \delta , 
\qquad \epsilon(n) < \epsilon + \delta,
\qquad | \underline R_n  - R | < \delta 
\eeq
and thus
\bearn
\inf_{v \in V} S( \pi_n^1 ( u_n^*) ,v) \geq 
\inf_{v \in V} S_n  (u_n^*,\sigma_n^2 (v)) - \delta 
\geq \inf_{v \in V_n } S_n  (u_n^*,v) - \delta 
\geq \underline R_n -  \delta - \epsilon(n)
\geq R - 3 \delta - \epsilon .
\eearn
So $\pi_n^1 ( u_n^* )$ is $\epsilon^{'}$-optimal for $S$ with $\epsilon^{'}=
3 \delta + \epsilon$.\\
In the same way, by assumption (A2) and considering 
an $\epsilon(n)$-optimal policy $v_n^*$ for $G_n$, we obtain that 
$\pi_n^2 (v_n^* )$ is $\epsilon^{'}$-optimal for $S$ for all large enough $n$.
The proof follows from the fact that $\delta$ was chosen arbitrarily.  \\
(3) Fix some $\delta > 0 $.
As $u^*$ is an $\epsilon$-optimal strategy in the limit game $G$
and by (A1), for all $n$ large enough, we have
\bearn
\inf_{v \in V_n } S_n( \sigma_n^1 (u^*) ,v) \geq
\inf_{v \in V_n } S(u^*, \pi_n^2 ( v ) ) - \delta 
\geq \inf_{v \in V} S(u^*,  v  ) - \delta 
\geq R - \delta -  \epsilon 
\geq R_n - 2 \delta -  \epsilon 
\eearn
The proof for $v^*$ is obtained in the same way.  \\
(4) Let $\hat v \in V$ be such that
$ \inf_{v \in V} S(\bar{u},v) \geq S(\bar{u}, \hat{v}) - \delta $.
By (A1), (A3) and part (1) of the Theorem, for all $\delta > 0$,
there exists $N(\delta)$ such that $n > N(\delta)$ implies
$\epsilon(n) < \epsilon + \delta $, and
\bearn
\inf_{v \in V} S(\bar{u},v) \geq S(\bar{u}, \hat{v}) - \delta 
& \geq & S(\pi_n^1 (u_n^*), \hat{v}) - 2 \delta \\
& \geq & S_n(u_n^*, \sigma_n^2( \hat{v}) ) - 3 \delta \\
& \geq & \underline R_n - 3 \delta - \epsilon(n)\\
& \geq & R - 4 \delta - \epsilon(n)
\eearn
and hence, $\bar u$ is $\epsilon'$-optimal with 
$\epsilon' = \epsilon + 5 \delta $.
In the same way, by (A2) and (A4) we prove that $\bar{v}$
is $\epsilon^{'}$-optimal for $S$.
The proof follows from the fact that $\delta$ was chosen arbitrarily.
\done
\begin{remark}
\label{rm1}
(i) In the rest of the paper,
whenever $U_n= U$ and $V_n=V$ do not depend on $n$, $\pi_n$ and $\sigma_n$
will be chosen as the indentity maps. \\ (ii)
It follows from the proof of part (1) in the
above Theorem that if for all $G_n$, $n=1,2,...,\infty$
there exist optimal policies for both players and if $U_n = U$
and $V_n = V$ do not depend on $n$, then 
\[
| \overline R_n - R | \leq \sup_{u,v} | S_n (u,v) - S(u,v) |, \qquad
| \underline R_n - R | \leq \sup_{u,v} | S_n (u,v) - S(u,v) |
\]
\end{remark}
%
\section{Stochastic games: the model}
\label{sec:prob}
We are going to use the results from the previous section 
to study approximations of zero sum stochastic 
games.

\begin{itemize}
\item Let $\fI $ be a denumerable set 
of states
\item $\fA_i  \hspace{.1in} (\fB_i)$  a compact set of actions 
for players I (resp. II) at state $i$. Let 
$K = \{ i , \fA_i , \fB_i \}_{i \in I}$.
\item $r: K \rightarrow \R$ a bounded immediate reward function 
(the boundedness condition can be relaxed, see Section \ref{sec:gen}).
Let  $ M \dfn \sup_{i,a,b} | r(i,a,b) | $.
\item $P(a,b)= [ p(i,a,b,E) ]_{i,E}$, $a \in \fA_i$, $b \in \fB_i$ 
is a (sub) probability transition (from state $i$ to a set 
$E \subset \fI$) when the players use actions a and b.
\item $\beta$ the discount factor satisfying $0 \leq \beta <1$
\end{itemize}

We shall use the following standard assumption
(see e.g. Nowak \cite{nowak}): \\
${\bf ( M_1 ):}$ $r(i,\bullet , \bullet ) $ and 
$ p( i, \bullet , \bullet , E) $ are continuous
in both actions for any $E \subset \fI$.

The game is played in stages $t = 0,1,2,...$. 
If at some stage $t$ the state is $i $, then the players
independently choose actions 
$a \in \fA_i$, $b \in \fB_i$. Player II then pays player I 
the amount $r(i,a,b)$ and at stage $t +1$ the 
new state is chosen according to the transition probabilities 
$p(i,a,b,\bullet)$. The game continues at this new state.\\
Let $U$ and $V$ be the set of behavioral strategies for both players.
A strategy $u\in U$ is a sequence $u=(u_0 , u_1 , ... )$ where
$u_t$ is a probability measure over the available actions,
given the whole history of previous states and of previous
actions of both players as well as the current state.

A Markov policy $q=\{q_0 , q_1 , ... \}$ is a policy (for either
player one or two) where $q_t$ is allowed to depend only
on $t$ and on the state at time $t$. 

A {\it stationary (mixed) policy} $g $ for player one
is characterized by a conditional distribution \\ 
$ p ( \bullet  \mid j )^g $
over $ \fA_j $, so that $ p_{\fA_j \mid j }^g =1 $,
which is interpreted as the distribution
over the actions available at state $j$ which player
I uses when it is in state j. 
With some abuse of notation, we shall set
$ g ( \bullet  \mid j ) = p ( \bullet  \mid j )^g $
for stationary $g$. Let $S^A$ be the
set of stationary policies for player 1, and define similarly
the stationary policies $S^B$ for player 2. If both players
use stationary policies, say $u$ and $v$,  then \{$ X_t $\}
becomes a Markov chain with stationary transition probabilities, 
given by
\begin{equation}
\nonumber
p(j,u,v,k) = 
\int_{ \fA_j} \int_{ \fB_j} p(j,a,b,k) 
u(da|j) v(db|j) .
\end{equation}

We are concerned in the following section
with the infinite horizon discounted problem. It is known that
under ($M_1$), optimal stationary policies exist 
for both players; i.e., if $u$ and $v$ are optimal policies 
for both players when both of them restrict to 
stationary policies, then each one of these policies is also
optimal against an arbitrary policy of his/her opponent.
We shall therefore restrict to stationary
mixed policies, without loss of generality (see \cite{fed,HVW}).

Next, we introduce a topology on the sets of stationary policies.
For any set $\Gamma$, let $M(\Gamma)$ denote the set of probability
measures on $\Gamma$ endowed with the weak topology $\xi(\Gamma)$ 
(see \cite{nowak}). The class of stationary policies for player 1
(and similarly for player 2) can be identified with the set
$ \prod_{i \in \fI} M(\fA_i) \times M(\fB_i)$; moreover it is compact with
respect to the product topology \\ 
$\prod_{i \in \fI} \xi( \fA_i ) \times \xi (\fB_i) $.

Let $(u,v)$ be a pair of strategies and let
$i \in \fI$ be a fixed initial state.
Let $I_t, A_t , B_t , {t=0,...}$ be the resulting stochastic process
of the states and actions of the players.
Let $E_i^{u,v}$ denote the expectation with respect to the measure defined
by $u,v,i$.  Define the {\bf $\beta$-discounted game payoff}
\beq
\label{Siuv}
S(i,u,v) 
= E_i^{u,v} \sum_{t=0}^{\infty} \beta^{t} r (I_t , A_t , B_t )
\eeq
Let $R(i)$ denote the value of the stochastic game for initial state $i$.
% The existence of the value, as well as the existence of
% of a saddle point among the stationary policies is well known
% (see e.g. \cite{nowak}).
For stationary policies $u$ and $v$,
let the expected current payoff be defined by:
\begin{equation}
r(i,u,v) 
= \int_{ \fA_j} \int_{ \fB_j} r(j,  a, b) 
u (da | i) v (db | i)
\end{equation}
Consider the following (contracting) map:
\begin{equation}
\label{operator}
\left ( T_{u,v} f \right )(i) \dfn
r(i,u,v) + \beta \sum_{j \in \fI}p(i,u,v,j) f(j)
\end{equation}
Then $S(i,u,v)$ is known to be the unique solution of (\ref{operator}).
The value $R(i)$ is the unique solution of 
\beq
\label{vlvl}
R(i) = val \left[ r(i,a,b) + \beta \sum_{j \in \fI}p(i,a,b,j) R(j)
\right]
\eeq
Moreover, any stationary policies $u^*$ and $v^*$
that chooses at any state $j$
the mixed strategies that are optimal for the matrix game
$ \left[ r(i,a,b) + \beta \sum_{j \in \fI}p(i,a,b,j) R(j) \right]_{a,b} $,
are known to be optimal for the stochastic game $S$
(see \cite{nowak} for these statements).

\begin{remark}
\label{continuous}
$S(i,\bullet,\bullet):S^A \times S^B \to \R$ are continuous 
for all states $i$. This follows from Corollary 2.2
in Borkar \cite{borkar}.
It will thus follow below that assumptions (A3) and (A4) hold.
\end{remark}

\section {State approximations: infinite horizon case}
\label{sec:app}
We introduce below several approximating schemes. All of them
involve some sequence $\fI_n \subset \fI $ of sets
of states, which are naturally chosen to be increasing.
We shall assume \\ 
$ ({\bf B1}) 
\hspace{.3in}
\displaystyle
\fI_n \subset \fI_{n+1}, \hspace{.3in} \bigcup_n \fI_n=\fI $ \\
The following property will imply conditions (A1)-(A2)
in the various schemes that we consider below: \\
$ ({\bf B2}) 
\hspace{.3in}
\epsilon(r,n) = \sup_{i\in \fI_r,a,b} \left \{
\sum_{j \notin \fI_n}
 p(i,a,b,j) \right \} \to 0
\: \mbox{ as } \: n \rightarrow \infty \, \, \, \,  \forall r $. 

\begin{remark}
\label{rmkB2}
Condition $(M_1)$ as well as the compactness of $\fA_i$
and $\fB_i$ imply (B2). Indeed, assume that (B2) does not hold.
Then, there exists some $\alpha > 0$ such that for some $i$,
\beq
\label{contr}
\lims_{n \to \infty} \max_{a,b} \left ( \sum_{j \in \fI}
p(i,a,b,j) 1 \{ j \notin \fI_n \} \right )
= \alpha 
\eeq
Let $a_n $ and $b_n$ be some actions achieving the max
(the fact that the max is achieved follows from the compactness
and continuity assumption ($M_1$)). Choose a subsequence 
${n(\ell)}, \ell=1,2,...$ along which the limsup is obtained and along
which $a_n$ and $b_n$ converge to some actions $a^*$ and $b^*$.
Then $p(i,a_{n(\ell)}, b_{n(\ell)} , \bullet )$ converges (pointwize) to 
the probability $p(i,a^* , b^* , \bullet )$ as $\ell \to \infty $
(by ($M_1$)). But then it follows from a dominant convergence
Theorem (\cite{royden} Ch. 11 Sec. 4) and from (B1) that
\[
\lim_{\ell \to \infty } \sum_{j \in \fI}
p(i,a_{n(\ell )} ,b_{n(\ell )} ,j) 
1 \{ j \notin \fI_{n(\ell )} \}  = 
\sum_{j \in \fI } p(i,a^*,b^*,j) \cdot 0 = 0
\]
which contradicts (\ref{contr}). Hence (B2) is established.
\end{remark}

For the case of a single player, (B2) was introduced
as an assumption for several approximating schemes
by Cavazos-Cadena \cite{CCb}. Note, however, that in \cite{CCb},
($M_1$) as well as the compactness of the action spaces
are not assumed. In order to obtain
conditions (A1) and (A2) for the approximating schemes below,
(and hence obtain statements (1), (2) and (3) in Theorem \ref{thm1})
one could relax the compactness assumption as well as ($M_1$);
in that case one would indeed need to impose (B2) as an assumption.
The compactness and ($M_1$) (or other similar assumptions, such as
($M_2$) or ($M_3$) from \cite{nowak})
are required however for establishing the continuity conditions
(A3) and (A4) required for establishing
statement (4) in Theorem \ref{thm1}).
 
Other typical assumptions that imply (B2) have
often been used in the literature, see
White \cite{white} and Hern\'andez-Lerma \cite{HLa}, as
well as ($B3$) introduced in Altman \cite{me2}
which will be used occasionally below: \\
({\bf B3}) From any state $k$, 
only a finite set of states $X_k$ can be reached.

In all approximations in this section, the approximating games
$G_n$ have a value, i.e. $R_n(i) = \underline R_n (i) =
\overline R_n (i) $. Moreover, they will have a saddle point among
the stationary policies.
%
\subsection{Approximation Scheme I}
\label{ss-as1}
We define:
\begin{equation}
\label{opop}
\left ( H^1_{u,v} f \right )(i) \dfn
                  \left \{ \begin{array}{ll}
               r(i,u,v) + \beta \sum_{j \in \fI_n }
                p(i,u,v,j) f(j)
               & \mbox{if $i \in \fI_n$}\\
               0 & \mbox{if $ i \notin \fI_n$}
               \end{array}
\right. 
\end{equation}
For this approximating problem we define $S^1_n(i,u,v)$ to be
the solution of 
\beq
\label{operator ap}
\left ( H^1_{u,v} f \right )(i) = f(i), \qquad \forall i \in \fI
\eeq
$S^1_n$ is thus the total discounted payoff (defined in
(\ref{Siuv})) for the stochastic game whose transition probabilities
are $\bar p$ instead of $p$, where  $\bar p(i,u,v,j) =
p(i,u,v,j)$ if $\{ i,j \in \fI_n \} $, otherwise 0.
The value of the game $G_n^1$ is the unique solution of
\beq
\label{vlvl2}
R_n (i) = 
\left\{
\begin{array}{lr}
val \left[ r(i,a,b) + \beta \sum_{j \in \fI_n }p(i,a,b,j) R_n (j) \right]
& \mbox{ if } i \in \fI_n \\
0
& \mbox{ if } i \notin \fI_n 
\end{array}
\right.
\eeq
Moreover, optimal stationary policies
$u^*_n$ and $v^*_n$ for the game $G_n^1$ 
are obtained by chosing at any state $j \in \fI_n $
the mixed strategies that are optimal for the matrix game
\[
\left[ r(i,a,b) + \beta \sum_{j \in \fI_n }p(i,a,b,j) R(j) \right]_{a,b} .
\]
%
$ $\\
The proof of the following Theorem will enable us to evaluate the precision
of the approximation. More precisely, it will enable us to
get a bound on $|R_n ( \bullet ) - R ( \bullet )| $ which will be uniform in
$i \in J$ where $J$ is an arbitrary fixed subset of $\fI$.
\begin{theorem}
\label{thm-ap1}
All statements of Theorem \ref{thm1} hold for approximating scheme I,
where the reward $S$ for the limit game $G$ is defined 
in (\ref{Siuv}) and for approximating game $G_n^1 $ it is $S_n^1$.
\end{theorem}
\Prf
The proof uses an idea by Cavazos-Cadena 
\cite{CCb}. We first need to introduce some definitions.
Let $\epsilon > 0$ we define:
\[
g^0(\epsilon,r)=r, \hspace{.3in} g^k(\epsilon,r)= 
g(\epsilon, g^{k-1}(\epsilon,r)) , \qquad k =1,2,...
\]
where
\[
g(\epsilon,r) = \min \left \{ m :\epsilon(r,m) 
\leq \epsilon \right \}
\]
and $\epsilon(r,m)$ is defined in property (B2).
To understand the meaning of $g^s(\epsilon , r)$, we consider
first $\epsilon = 0$. Then
$\fI_{g^{s+1}(0,r)}$ is the set of neighbors of
$\fI_{g^{s}(0,r)}$ in the sense that states that are not
contained in $\fI_{g^{s+1}(0,r)}$ are not reachable from any
state in $\fI_{g^{s}(0,r)}$. For $\epsilon > 0$,
$\fI_{g^{s+1}(\epsilon,r)}$ is the set of ``$\epsilon$-neighbors'' of
$\fI_{g^{s}(\epsilon,r)}$ in the sense that 
for any state $i$ in $\fI_{g^{s}(\epsilon,r)}$,
the states that are not contained in $\fI_{g^{s+1}(\epsilon,r)}$ are 
reachable from $i$ with probability smaller than or equal to $\epsilon$.
Note that for any $r \geq0 $,
\begin{eqnarray}
\label{mk}
\sum_{j \notin \fI_{g^{l+1}(\epsilon,r)}}p(i,u,v,j) 
\leq \epsilon, \hspace{.3in} \forall i \in \fI_{g^l(\epsilon,r)}, 
\: \forall l \geq 0 , \: \forall u,v .
\end{eqnarray}
Note also that $g^l(\epsilon , r)$ need not be increasing in $l$.

Let $J \subset \fI$, 
$\epsilon(J) = min \left \{ m : J \subset \fI_m \right \}$,
and suppose $\epsilon(J) < + \infty$ (this is the case if $J$ is chosen
to be finite). We define % for all $i \in J$:
\begin{equation}
m_k(\epsilon,\epsilon(J))=
\max \left \{\epsilon(J), g(\epsilon,\epsilon(J)),...,
g^k(\epsilon,\epsilon(J)) \right \} \, \, \, k=0,1,2,...
\end{equation}
 
We show that assumptions (A1)-(A4) hold for 
$S^1_n(i,u,v)$ and $S(i,u,v)$ 
defined above. \\
Below, $\epsilon$ and $J$ are held fixed, so that for
simplicity of notation we shall write
$g^l$ instead of $g^l( \epsilon, \epsilon(J))$.
Let $n \geq m_k(\epsilon , \epsilon (J))$, then for all $i \in J$, 
\begin{eqnarray}
\label{tau1}
|S^1_n(i,u,v) - S(i,u,v)|   & \leq & 
\beta \sum_{j \in \fI_{g^1}} p(i,u,v,j) \: 
| S^1_n(j,u,v) - S(j,u,v) |
\nonumber \\ 
&+&\beta \sum_{j \notin \fI_{g^1}}
 p(i,u,v,j)\: | S^1_n(j,u,v) - S(j,u,v) |
\end{eqnarray}

Note that for any state $j$,
$| S^1_n(j,u,v) | \leq M/(1- \beta)$, and
$| S(j,u,v) | \leq M/(1- \beta)$.
Hence by (\ref{mk}), (\ref{tau1}) and (B2) we obtain:
\begin{eqnarray}
\nonumber
\lefteqn{
|S^1_n(i,u,v) - S(i,u,v)|  } \\
\nonumber
& \leq &
\beta \sum_{j \in \fI_{g^1}} p(i,u,v,j) \: 
| S^1_n(j,u,v) - S(j,u,v) |
\nonumber
+ \epsilon \frac{2 M \beta } {1 - \beta}  \\
& \leq & 
\beta \max_{j \in \fI_{g^1}} \: 
| S^1_n(j,u,v) - S(j,u,v) |
\nonumber
+ \epsilon \frac{2 M \beta } {1 - \beta} \\
& \leq & 
\beta \max_{j \in \fI_{g^1}} \: 
\left \{\beta \sum_{\ell \in \fI_{g^2}} 
p(j,u,v,\ell) \: 
| S^1_n(\ell,u,v) - S(\ell,u,v) |
\right.
\nonumber \\
&& 
\nonumber
\left.
+ \beta \sum_{\ell \notin \fI_{g^2}} 
p(j,u,v,\ell) \: 
| S^1_n(\ell,u,v) - S(\ell,u,v) |
\right \}  + \epsilon \frac{2 M \beta } {1 - \beta} \\
& \leq & 
\label{prff}
\beta^2 \max_{\ell \in \fI_{g^2}}  \: 
| S^1_n(\ell,u,v) - S(\ell,u,v) |
 +   \epsilon \frac{2 M \beta^2 } {1 - \beta}
+ \epsilon \frac{2 M \beta } {1 - \beta} \\
& \leq & 
\beta^k \max_{ \ell \in \fI_{g^k}} \,
| S^1_n(\ell,u,v) - S(\ell,u,v) |+
2 \epsilon \frac {M \beta } {1 - \beta} 
\sum _{\ell=0}^{k-1} \beta^{\ell}
\nonumber
\end{eqnarray}
The first inequality follows by (\ref{mk}) since $n \geq m_k \geq g^1 $
and since $i \in J \subset \fI_{g^0}$.
Similarly, (\ref{prff}) 
follows by (\ref{mk}) since $n \geq m_k \geq g^2 $
and since $\ell \in \fI_{g^1}$.
So we have:
\begin{equation}
\label{tau3}
| S^1_n(j,u,v) - S(j,u,v) |
\leq 2 M \frac{\beta (1 - \beta^{k} )
\epsilon/(1 - \beta) + \beta^k } {1 - \beta}
\end{equation}
Hence, (A1) and (A2) hold true, (where as (A3)-(A4) are established in
Remark \ref{continuous}).
\done

Combining (\ref{tau3}) with Remark \ref{rm1} yields:
\begin{corollary}
\label{cor01}
For any $i \in J$, if $n$ is chosen such that
$ n \geq m_k ( \epsilon , \epsilon(J)) $, then
\beq
\label{eq-cor}
| R_n(i) - R(i) | 
\leq 2 M \frac{ \beta (1 - \beta^{k} )
{\epsilon}/(1 - \beta) + \beta^k } {1 - \beta} .
\eeq
\end{corollary}
%

In the previous Theorem, the sets $\{ \fI_n \}$ where given a priory.
Next we consider a special choice of $\{ \fI_n \}$, that will 
 be especially useful under assumption (B3), for a finite set $J$.
This construction will enable us to express in a simple way the
set $\fI_n$ needed in (\ref{opop}) in order to approximate $R$ by
$R_n$ with a given error. This is especially desirable when 
 it is not easy to compute $ m_k ( \epsilon , \epsilon(J)) $
(and thus Corollary \ref{cor01} can not be used).

Let $J$ be a given set (for which we would like to get a computable uniform
bound on the error of the approximation), and set
$Y(i) = \left \{ j: p(i,u,v,j) > 0 \mbox{ for some } u,v \right \}$. Then
we define $\fI_n$ in the following  way:
\begin{equation}
\label{in}
\fI_0 = J, \hspace{.4in}
\fI_{n+1}= \bigcup_{i \in \fI_n} Y(i) \bigcup \fI_n
\end{equation}
%
\begin{remark}
\label{rl}
Note that if $J$ is finite and if (B3) holds, then all
sets $\fI_n$ are finite. This construction might be especially useful
if the number of states reachable from any given state is small.
In that case $\fI_n$ do not grow too quickly.
\end{remark}
We now consider $S_n(i,u,v)$ as the solution 
of (\ref{operator ap}) with $\fI_n$ defined in (\ref{in}). We have that 
the following theorem (the analogous of (\ref {thm-ap1})) holds:

\begin{theorem}
\label{thm-ap2}
(i) Fix a state $i \in J$. Then
all statements of Theorem \ref{thm1} hold for approximating scheme I,
where the reward $S$ of limit game $G$ is defined in (\ref{Siuv}) and the
reward $S_n$ for the
approximating game $G_n $ is the solution of (\ref{operator ap})
with $\fI_n$ defined in (\ref{in}).
\sp
(ii) For any $i \in J$ and $n=0,1,...$, 
$| R_n (i) - R (i) | \leq 2 M \beta^n/(1-\beta) $. 
\end{theorem}

\Prf
It suffices to prove that (A1) and (A2) are satisfied. Let $i \in J$, then: 
\begin{eqnarray*}
|S_n(i,u,v) - S(i,u,v)| 
&\leq & \beta \sum_{j \in Y(i)}
p(i,u,v,j) |S_n(j,u,v) - S(j,u,v)| \\
&\leq & \beta \max_{j \in \fI_1}
 |S_n(j,u,v) - S(j,u,v)| \\
& \leq & \beta^2 \max_{j \in \fI_1}
 \sum_{k \in Y(j)}
p(j,u,v,k) |S_n(k,u,v) - S(k,u,v)| \\
&& . \\
&& . \\
&& . \\
& \leq &  \beta^n \max_{j \in \fI_n}
|S_n(j,u,v) - S(j,u,v)| \leq \frac{2 M \beta^n}{1-\beta} 
\end{eqnarray*}
(ii) then follows from Remark \ref{rm1}.
\done

As suggested in Remark \ref{rl}, the above method is
useful especially when the approximating games have finite
states (i.e. $\fI_n$ are finite) and the typical number of states
(neighbors) reachable from a state is not too high.
If, however, the typical number of neighbors is high, then
the sets $\fI_n$ become large very rapidly, which suggests
that obtaining good estimates of optimal value and
policies might require an unexceptably high complexity
of computations.
We thus present an alternative more general way
of constructing finite sets $\fI_n$ (even when (B3) does not
hold), which will result in a simple expression 
for $m_k(\epsilon , \epsilon(J))$, and will thus enable to make use
of Corollary \ref{cor01} to obtain uniform computable error bound for the
approximation for any $i \in J$.

We define a parametrized family $\{\fI_n ( \epsilon ) \}$, where
$\epsilon$ is a positive real number. Define $\fI_0 (\epsilon ) = J$.
$\{ \fI_n(\epsilon) \}$ are then chosen to be an arbitrary sequence increasing
to $\fI$ that satisfies the following. If for some $l > 0$, say $l = \hat l$,
\[
\sup_{a,b,i \in \fI_l ( \epsilon )}
\sum_{j \notin \fI_{l}(\epsilon ) }
p(i,a,b,j) \leq \epsilon 
\]
then $\fI_n(\epsilon) = \fI $ for all $n > \hat l$.
Otherwize, $\fI_{l+1}$ is chosen such that
\[
\sup_{a,b,i \in \fI_l ( \epsilon )}
\sum_{j \notin \fI_{l+1}(\epsilon ) }
p(i,a,b,j) \leq \epsilon 
\]
It follows that $ \epsilon(J)=0$, 
$g^0 = 0 $,
and hence $g^k =  k $ and 
$m_k ( \epsilon , \epsilon (J)) = g^k  = k $
for $k \leq \hat l $.
(The above quantities were defined in the proof of Theorem \ref{thm-ap1}).
If $J$ is finite then it follows from the same arguments as in
Remark \ref{rmkB2} that
$ \fI_n (\epsilon)  $ can be chosen to be finite
for $n \leq \hat l$ and hence in particular $ \fI_{\hat l} ( \epsilon ) $,
which is the truncated state space that should be used to perform
Approximation Scheme I in order to obtain a 
precision as in Corollary \ref{cor01}.

In this setting, Theorem
\ref{thm-ap2} (ii) becomes a special case
of Corollary \ref{cor01} with $\epsilon=0$.

\subsection{Approximation Scheme II}
In the previous approximation scheme, the dynamics 
are seen to be a result of transition 
probabilities that need not sum to one, even if
in the limit game they do sum to one. Indeed, (\ref{opop}) 
can be considered as a stochastic game where
we set $p(i,u,v,j)=0$ for $j \notin \fI_n $.
In many applications this may be undesirable,
and one would like $p(i,u,v,\bullet ) $ to remain a probability
measure. This is especially the case when we want to learn about
the optimal value and (almost) optimal policies for specific
given stochastic games with large finite state space, by approximating them
through an infinite state game. Indeed, there are cases where one can 
solve easier an infinite game, since some boundary problems are avoided.
Examples are given in Section \ref{sec:ss-ap}.

We assume that $\sum_{j \in \fI} p(i,a,b,j) = 1 $ for all $a \in \fA_i,
b \in \fB_i$.
We define the following sequence of games.
We let $\fI_n \subset \fI $ be an increasing sequence of sets, 
converging to $\fI$, as in the previous Section. Define
\begin{equation}
\label{op2}
\left ( H^2_{u,v} f \right )(i) =
                  \left \{ \begin{array}{ll}
               r(i,u,v) + \beta \sum_{j \in \fI_n}p^*(i,u,v,j) f(j)
               & \mbox{if $i \in \fI_ n$}\\
               0 & \mbox{if $ i \notin \fI_n$}
               \end{array}
\right. 
\end{equation}
where:
\begin{equation}
\label{pp}
p^*(i,u,v,j) =
               \left \{ \begin{array}{ll}
               p(i,u,v,j)+q_n(i,u,v,j) 
               & \mbox{if $i \in \fI_n$, $j \in \fI_n$}\\
               0 & \mbox{if $ i \notin \fI_n$ or $j \notin \fI_n $}
               \end{array}
\right. 
\end{equation}
$q_n(i,u,v, \bullet )$ is some non negative measure satisfying
$ \sum_{j \in \fI_n} p(i,u,v,j)+q_n(i,u,v,j)  = 1 $. Hence,
\beq
\label{hence}
\sum_{j \in \fI_n} q_n(i,u,v,j) =\sum_{j \notin \fI_n} p(i,u,v,j) 
\eeq
We define $S^2_n$ to be the solution of 
\beq
\label{operator ap2}
\left ( H^2_{u,v} f \right )(i) = f(i)
\eeq
$S^2_n$ is thus the total discounted payoff (defined in
(\ref{Siuv})) for the stochastic game whose transition probabilities
are $p^*$ instead of $p$. 

All the results of Subsection \ref{ss-as1} still hold.
We demonstrate this with the proof of the analogue of 
Theorem \ref{thm-ap1}. It suffices to show that (A1)-(A2) hold.
With the same notation as in Subsection \ref{ss-as1} we obtain: 
\begin{eqnarray*}
\lefteqn{
|S^2_n(i,u,v) - S(i,u,v)|  } \\
& \leq &
\beta \sum_{j \in \fI_{g^1}} p(i,u,v,j) \: 
| S^2_n(j,u,v) - S(j,u,v) |
+\beta \sum_{j \in \fI_{g^1}} q(i,u,v,j) \: 
| S^2_n(j,u,v) - S(j,u,v) | \\ && \qquad \qquad
+\beta \sum_{j \notin \fI_{g^1}} p(i,u,v,j) \: 
| S^2_n(j,u,v) - S(j,u,v) |
\end{eqnarray*}
and by (\ref{hence})
\begin{eqnarray*}
\lefteqn{ |S^2_n(i,u,v) - S(i,u,v)|  } \\
& \leq  &
\beta \sum_{j \in \fI_{g^1}} p(i,u,v,j) \: 
| S^2_n(j,u,v) - S(j,u,v) | 
+ 2 \beta \sum_{j \notin \fI_{g^1}} p(i,u,v,j) \: 
| S^2_n(j,u,v) - S(j,u,v) | \\
& \leq &
\beta \sum_{j \in \fI_{g^1}} p(i,u,v,j) \: 
| S^2_n(j,u,v) - S(j,u,v) |
+ 2 \epsilon \frac{2 M \beta } {1 - \beta }
\end{eqnarray*}
So continuing as in the proof of theorem 4.1 we have:
\begin{equation}
\label{tau33}
| S^2_n(j,u,v) - S(j,u,v) |
\leq 2 M 
\frac{ 2 \beta
(1 - \beta^{k})  \epsilon(N)/(1 - \beta) + \beta^k} 
{1 - \beta}
\end{equation}
for $n \geq m_k(\epsilon , \epsilon(J))$.
%
\subsection{Approximation Scheme III}
\label{asIII}
The basic idea of the approximation scheme is to fix some
stationary policies for both players, and use them in all states
except for a subset $\fI_n$. The problem is then of determining
the optimal mixed strategies for both players in the remaining
set of states $\fI_n$. We are interested in studying the asymptotic
behavior of this approach as $\fI_n \to \fI$. Similar approaches
were used in a framework of Markov decision processes (e.g.  \cite{me1}),
where $\fI_n$ were assumed finite.
We first fix some arbitrary policies $\hat u \in U$,
$\hat v \in V$. We shall now use the framework of
Theorem \ref{thm1}. Define
\[
U_n = \{ u \in U : u(i) = \hat u (i), \  \forall i \notin \fI_n \},
\qquad
V_n = \{ v \in V : v(i) = \hat v (i), \  \forall i \notin \fI_n \}.
\]
Fix some $i \in \fI$.
The limit game is defined as $S(u,v) = S(i,u,v)$, where
$S(i,u,v)$ is given in (\ref{Siuv}).
For any $u \in U_n , v \in V_n$, define $S_n ( u , v ) = S(u , v) $
We set $\pi_n^1$ and $\pi_n^2$ to be the identity mappings and 
\[
\sigma_n^1 (u) (i) = 
\left\{
\begin{array}{lr}
u (i) & \mbox{ if } i \in \fI_n ,\\
& \\
\hat u (i) & \mbox{ if } i \notin \fI_n ;
\end{array}
\right.
\qquad \qquad 
\sigma_n^2 (v) (i) = 
\left\{
\begin{array}{lr}
v (i) & \mbox{ if } i \in \fI_n ,\\
& \\
\hat v (i) & \mbox{ if } i \notin \fI_n .
\end{array}
\right.
\]
\begin{theorem}
Fix a state $i$. Then \\ 
(i) All statements of Theorem \ref{thm1} hold for approximating scheme III.  \\
(ii) $R_n$ is the unique fixed point of the equation
\beq
\label{abv}
R_n (k) = 
\left \{ 
\begin{array}{lr}
val \left[ r(k,a,b) + \beta \sum_{j \in \fI } p(k,a,b,j) R_n (j) \right]
& k \in \fI_n \\
& \\
r(k, \hat u , \hat v ) + \beta
\sum_{j \in \fI } p(k,\hat u , \hat v , j) R_n (j)
& k \notin \fI_n
\end{array}
\right.
\eeq
(iii) Optimal stationary policies $u_n$ and $v_n$ for both
players are obtained by using at any state $k \in \fI_n$
mixed strategies that achieve the value in (\ref{abv}).
\end{theorem}
The proof of this theorem is similar to the proof of theorem 
4.1.

\section{State approximations for the case of finite horizon}
\label{sec:fh}

Consider the model in Section \ref{sec:prob} with, however,
a finite horizon reward criterion instead of (\ref{Siuv}):
\beq
\label{Siuvm}
S^{[m]} = E_i^{u,v} [ \sum_{t=0}^m \beta^t r(I_t , A_t , B_t ) ]
\eeq
It is well known that there exist optimal policies for both
players within the class of Markov policies.
The value $R$ of $S^{[m]} $ is obtained by the recursion:
\begin{eqnarray}
\label{value}
R^{m+1} & \dfn & 0 \\
\nonumber
R^k (i) & \dfn & val  \left[ r(i,u,v) + \beta \sum_{j \in \fI}
p(i,u,v,j) R^{k+1} (j) \right ], \qquad k=0,...,m \\
\nonumber
R & \dfn & R^0
\nonumber
\end{eqnarray}
Define $  U[m]= (U^0 [m] , U^1 [m] , ... , U^m [m] )$,
$  V[m]= (V^0 [m] , V^1 [m] , ... , V^m [m] )$ where
$U^k[m] , V^k [m] $ are the set of mixed strategies which are
optimal for the matrix game
\beq
\label{arg}
\left[ r(i,a,b) + \beta \sum_{j \in \fI}
p(i,a,b,j) R^{k+1} (j) \right ]_{a,b} , \ k=0,...,m .
\eeq
for all $i \in \fI $.
Then, any Markov policies $(u,v)$ such that $u_t \in U^t$,
$v_t \in V^t $, $t=0,...,m$, are optimal for the stochastic game
$S^{[m]}$.

In order to apply the results from Section \ref{sec:app}
to the finite horizon case
we make the following observation.
The finite horizon model is equivalent to the following
infinite horizon model with enlarged state space:
\begin{itemize}
\item $\hat \fI =  \fI \times \{ 0,...,m \} $;
\item 
$\hat \fA_{(i,k)} = \fA_i $,
$\hat \fB_{(i,k)} = \fB_i $;
\item 
$ \hat r ((i,k),a,b) = r(i,a,b) $;
\item
$\hat p ((i,k),a,b,(j,l)) = 
\left\{
\begin{array}{lr}
p(i,a,b,j) & \mbox{ if } k+1 = l \leq m \\
& \\
0 & \mbox{ otherwize }
\end{array}
\right.
$
\item
$ \hat \beta = \beta $;
\end{itemize}

Define
\[
\hat S ( \hat i,\hat u, \hat v) = E_{\hat i}^{\hat u , \hat v } 
\sum_{t=0}^\infty \beta^t r(\hat I_t , \hat A_t , \hat B_t )
\]

There is a one to one correspondence between stationary policies
in the new model and Markov policies in the original one;
if $ \hat u , \hat v$ are stationary in the new model, then
the corresponding Markov policies in the original model
are given by
\beq
\label{equiv}
u_t ( \bullet | x ) = \hat u ( \bullet | (x,t) ) , \qquad \qquad
v_t ( \bullet | x ) = \hat v ( \bullet | (x,t) ) .
\eeq
and vice versa. Moreover, we have
\[
S^{[m]} ( u , v ) = \hat S (  \hat u , \hat v )
\]
Consequently, the state approximation schemes from the previous
section also hold for the case of finite horizon model.
The computation of the (approximating) values and (almost) optimal
policies can be done by using the above infinite horizon
model with enlarged state space, and then applying (\ref{equiv}).
$\hat \fI_n $ may be chosen, for example, as
$\hat {\fI}_n = \fI_n \times \left \{0,...,m \right \}$.

%
\section{Successive approximations}
\label{sec:safsa}
%
We study in this section several new aspects of successive
approximations. The convergence of the value of successive 
approximation is already well known, \cite{nowak3,S,vdv}.
By applying Theorem \ref{thm1}, we establish in the following
subsection, the convergence of (almost) optimal policies.
We then study the application of
both state approximation and finite horizon approximation.
Finally we discuss the restriction of games with finite horizon
to stationary policies.

\subsection{Convergence of policies for successive approximations}
%
An interesting application of the results in the previous subsections,
is the observation that successive approximations (or value iteration) can 
be viewed as a special case of state approximations. 
One can define game $G_n$ such that
$S_n = S^{[n]} $, where $S^{[n]} $ is given in
(\ref{Siuvm}), and consider $S$ as defined in (\ref{Siuv}).
Let $u_n^*$ and $v_n^*$ be a pair of optimal (or $\epsilon_n$-optimal,
where $ \lim_{ n \to \infty} \epsilon_n = 0$) Markov policies for $G_n$.
Let $u^*$ and $v^*$ be any $\epsilon$-optimal stationary (or Markov)
policies for the infinite horizon game $S$.
If we use for both $G_n$ and $G$ the equivalent infinite horizon
model with the enlarged
state space defined in Section \ref{sec:fh}, 
then $u^*_n , v_n^* , u^* $ and $v^*$ all have an equivalent 
representation as stationary policies. The problem becomes one of 
approximating the state space $\fI' = ( \fI \times \N )$ by the subsets
$ \fI_n' = ( \fI \times \{ 0,1,...,n \} ) $. Using then Theorems
\ref{thm1} and \ref{thm-ap1}, we conclude:
%
\begin{theorem}
(i) $ \lim_{n \to \infty } R_n = R $.  \\
(2) For any $\epsilon' > \epsilon$, there exists $N$ such that
$u_n^*$ (resp. $v_n^*$) is
$\epsilon'$-optimal for the infinite horizon game, for all $n \geq N$.  \\
(3) Let $\bar{u} \in U$ (resp. $\bar{v} \in V$) be a limit point of
$u_n^*$ (resp. $v_n^*$).
Then $\bar{u}$ (resp. $\bar{v}$) is $\epsilon$-optimal for the
limit game.  \\
(4) For all $\epsilon' > \epsilon $,
there exists $N (\epsilon' ) $ such that $u^*$ is $\epsilon^{'}$-optimal
for the $n$th approximating game, for all $n \geq N (\epsilon') $.
\end{theorem}

\subsection{Successive approximation and finite state approximation}
We use the above approach 
to combine state approximations with finite horizon
reward criterion. Such a combination 
may be especially useful for computational
purposes, where $\fI_n$ can be chosen to be finite.
We can now compute $R_n$ and (Markov) policies which are
optimal for $G_n$, using approximating
schemes introduced in the previous Sections, 
in order to approximate the optimal value
$R$ and an almost optimal strategy for the original limit game
$G$. Let again
$\fI_n \subset \fI $ be an increasing sequence of sets
of states, converging to $\fI$.
One can repeat the construction of a model with
enlarged state space (that includes, both the original 
state space and the time), so that, the state space for the
$n$th game $G_n$ is $ \hat \fI_n = ( \fI_n \times \{0,1,...,n\} ) $, and 
for the limit game $G$, it is
$ \hat \fI = ( \fI \times \N )$. This would establish the correctness
of approximations based on value iteration
for a problem with truncated state space.
For example, if we adapt the first approach in Section
\ref{sec:app}, we get the approximating values $R_n$ and
Markov policies by performing the following iterations:
%
\begin{eqnarray*}
R_n^{n+1} (j)& \dfn & 0 \\
R_n^k (j)& \dfn & 
          \left \{ \begin{array}{ll}
          val \left[ r(i,a,b) + \beta \sum_{j \in \fI_n }p(i,a,b,j) R^{k+1}(j)
          \right] & \mbox{if $i \in \fI_n$}\\
          0 & \mbox{ if $ i \notin \fI_n$}
          \end{array}
\right.  , \qquad k=0,...,n \\
R_n & \dfn & R_n^0
\nonumber
\end{eqnarray*}
Define $  U_n= (U_n^0  , U_n^1  , ... , U_n^n  )$,
$  V_n[m]= (V_n^0  , V^1_n  , ... , V^n_n  )$ where
$U^k_n , V^k_n  $ are the set of mixed strategies which are
optimal for the matrix game
\beq
\label{arg2}
val \left[ r(i,a,b) + \beta \sum_{j \in \fI_n }p(i,a,b,j) R_n^{k+1}(j)
\right]_{a,b} \qquad k=0,...,m 
\eeq
for any $i \in \fI_n$.
Then, any Markov policies $(u,v)$ such that $u_t \in U_n^t$,
$v_t \in V_n^t $, $t=0,...,m$, are optimal for the stochastic game
$G_n $.
%

\subsection{Finite horizon and stationary policies}
\label{0000}
For simplicity of implementation, one may be interested to
restrict to the class of stationary policies in a stochastic game,
rather than use Markovian policies (or others). It is well known,
however, that finite horizon games do not have a value
within the class of stationary policies. However, it is 
immediate to see that the conditions (A1)-(A4) hold
when restricting to stationary policies, and thus we
conclude from Theorem \ref{thm1} that the optimal stationary policies 
for both players converge (in the sense of Theorem \ref{thm1} (3))
to the strongly-optimal policy of
the infinite horizon game, as the horizon goes to infinity.
Moreover, the lower and upper values converge to the value of
the infinite horizon game.

\section{Convergence of the discount factor and immediate reward}
\label{sec:disc-re}
We establish in this section the robustness of
values and optimal policies 
with respect to the discount factor and immediate reward. This 
may be of importance in case
that these parameters are not known precisely.
One can similarly establish robustness for random time-varying
discount factor and immediate reward. 
We consider a horizon $m$ which may be either infinite or finite.

We consider a sequence
of stochastic games $G_n$, $n=0,1,2,...$ where the quantities
defining each one of them are as in Section \ref{sec:prob}, except 
for the immediate reward and discount factor which are replaced by
$\beta_n = \beta + \delta_n $, $r_n = r + \rho_n $,
where $\delta_n$ and $\rho_n$ converge to zero as $n \to \infty$,
uniformly in the states and actions.
Denote by $S_n(i,u,v)$ the reward for game $G_n$ (as defined either in 
(\ref{Siuv}) or in (\ref{Siuvm})).
Then
\begin{eqnarray*}
\lefteqn{
| S(i,u,v) - S_n (i,u,v) |} \\
& \leq &
E_i^{u,v} \sum_{t=0}^m \left( \beta^t | \rho_n (I_t , A_t , B_t , t) |
+ | [ \beta^t - (\beta + \delta_n )^t ]  | M + 
| (\beta + \delta_n)^t \rho_n (I_t, A_t , B_t ) | \right)
\end{eqnarray*}
and we obtain convergence to zero uniformly over all 
Markovian policies $u$ and $v$ (to which we may restrict, without loss
of generality, as in Section \ref{sec:fh}). 
This implies conditions (A1) and (A2) 
and hence, by Remark \ref{continuous}, we see that 
all statements of Theorem \ref{thm1} hold. This establishes the continuity of
the value of the stochastic game as a function of
the discount factor $\beta$ in
the open interval $\beta \in (0,1)$, and as a function of the immediate reward.
Moreover, it establishes the convergence of (almost) optimal policies
(in the sense of Theorem \ref{thm1}).

A specially interesting case is the asymptotics of 
stochastic games as $\beta \to 1$. 
We restrict for simplicity to the case of finite state and
action spaces. The asymptotic behavior
of the value of the game was studied by 
Bewley and Kohlberg \cite{BK}.  In fact they establish
the convergence of $(1- \beta ) R_\beta (i)$ to the value
$R_{average}$ of the expected long run time-average game 
(where $R_\beta (i)$ is the value of the game with discount factor $\beta$ 
and initial state $i$). 

When trying to apply the approximating theorem \ref{thm1} to
the limit as $\beta \to 1$, we are faced with the following
problems: \\
(i) The limit game does not have a value among the stationary (nor even the 
Markov) policies (see the ``big match" by Blackwell and Ferguson
\cite{BF}).  \\
(ii) The value of the limit game (with the expected average reward) is 
in general not continuous in the policies (and thus assumptions
(A3) and (A4) do not hold in general). This is the case 
even for a single controller, for which it is known that the
value may exhibit discontinuity in the parameters (see
Gaitsgory and Pervozvanskii \cite{GP} p. 407).

However, both problems are avoided in case that we restrict
to either games with perfect information or to irreducible games 
(see Gillette \cite{gillette}).
Games with perfect information (resp. irreducible games)
have a saddle point within the
class of stationary deterministic policies
(stationary mixed policies, respectively), 
and (A1) and (A2) hold, see \cite{gillette}.
Within these classes of policies, (A3) and (A4) also hold;
indeed, for the perfect information case this follows from the
fact that there is only a finite number of stationary deterministic
policies. For the irreducible case, this follows e.g. from \cite{me2}.
%
\section{Applications}
\label{sec:ss-ap}
We present in this Section a few problems that motivated our research
on approximations in stochastic games. As mentioned in the
introduction, many discretization schemes of differential games
yield dynamic programming that can be interpreted as representing
some stochastic game. In some pursuit evasion games, such as
the game of the two cars \cite{PT}, an additional finite-state 
approximation is then required. The calculations in \cite{PT}
was done following scheme I (introduced in Subsection \ref{ss-as1}).
The state transition in the discretized model satisfied property (B3),
and, in fact, each state had at most four neighbors (i.e.
four states reachable in one transition). This feature motivates the use
of Theorem \ref{thm-ap2} for such applications, which not only
establishes the convergence, but also gives the rate
of convergence (or, more precisely, enables to compute 
$n$ for obtaining any required precision).

Another application of the theory we developed in
previous sections are stochastic games appearing in queueing systems.
Such problems may serve as model for situations of conflicts between
users in telecommunication systems, or for worst-case
control situation in presence of some unknown disturbance
(in production systems, or again in telecommunications
applications).
An interesting feature in the control of queueing networks
is that often, infinite queues are easier to handle than finite 
queues, as some boundary problems are avoided.
Moreover, the optimal policies for infinite horizon
problems, being stationary, are easier to implement than those
for finite horizon problems. 
In real applications, however, queues are
always finite; moreover, one is often interested in finite horizon
problems (e.g., controlling manufacturing during working hours,
etc). Our results may thus be applied to obtain almost optimal
policies for these cases.
Here are some examples: \\
(i) Altman considered in \cite{flow} a stochastic game with
an infinite state space in order to solve a flow control problem
with an infinite buffer.  The solution of
the problem with a finite buffer \cite{finite} seems more
involved, and was only obtained under an important restriction
on the actions of the flow controller (namely, it had to contain
an action that corresponds to rejection of arriving customers). \\
(ii) Altman and Koole \cite{AK} solved a game where one or more
servers have to be assigned to customers of different classes.
In a telecommunication context, the different classes may represent 
different traffic types, such as voice, video and data, and the
servers may represent a channel, through which the traffic has
to be transmitted. A controller has to decide a customer of which 
class will be served next (which traffic will have access to
the channel). The input traffic was assumed to be controlled
as well, e.g. it may have been the output of some dynamic routing mechanism
or dynamic flow control. The problem was posed as a zero-sum
stochastic game between the service controller and
``nature" which represented the unknown input control mechanism.
Simple structural results were obtained for the case of infinite
queues. In the case of finite queues, the structure of optimal
policies is unknown, even in the case of uncontrolled input.
By applying our second state approximation scheme, it follows that
the policies obtained for the problem of infinite queues are
almost optimal for the case of finite queues which are large enough.

\section{Further Generalizations}
\label{sec:gen}

Although we considered in this paper bounded reward,
it is well known that different sets of conditions exist for
which problems with unbounded reward can be transformed into ones with
bounded reward.  Such transformations have been used in the past for finite
state approximations of Markov Decision Processes,
see White \cite{W}. The generalization of such conditions
to games are straightforward (see e.g. Wessels \cite{wessels}).

There are many other useful directions where the general approximation
Theorems are applicable, on which we continue our investigation.
Among these are \\
(i) Differential games,
in which a standard problem is to discretize both space and time.
Several works have been done in this direction, see 
\cite{BFS,BS,PTid,PT,TG},
where the convergence (and rate of convergence, see \cite{PTid}) of 
the values of the approximating games have been established.
However, little is known about the convergence of policies.
Theorem \ref{thm1} seems to be a suitable tool for approaching these 
issues. \\
(ii) Discretization of stochastic games with general state
and action spaces. Some results were obtained in the case of
a single controller, see Sec. 6 of Hernandez-Lerma and references therein.
Further results for stochastic games on the convergence of the 
value and some results on convergence of policies were obtained by
Whitt \cite{whitt1}, and then generalized to $N$ person games in
\cite{whitt2}.

\vspace{1cm}
\noindent
{\bf Acknowledgements}
The authors wish to thank Odile Pourtallier and Pierre
Bernhard for fruitful and illuminating discussions.

\begin{thebibliography}{9999}
%
\bibitem{me1}
E. Altman,
``Denumerable Constrained Markov Decision Problems
and Finite Approximations",
to appear in {\it Math. of Operations Research}.
%
\bibitem{me2}
E. Altman,
``Asymptotic Properties of Constrained Markov Decision Processes",
{\it Zeitschrift f\"ur Operations Research},
Vol. 37, Issue 2, pp. 151-170, 1993.
%
\bibitem{flow}
E.~Altman, ``Flow control using the theory of
zero-sum Markov games,''
{\it Proceedings of the 31st IEEE Conference on Decision and
Control}, Tucson, Arizona, pp. 1632-1637, December 1992,
to appear in {\it IEEE-AC}.
%
\bibitem{finite}
E. Altman,
``Monotonicity of optimal policies in a zero sum game:
a flow control model", to appear in
{\it Annals of dynamic games}, Vol. 1, 1993.
%
\bibitem{AK}
E. Altman and G. Koole,
``Stochastic Scheduling Games with Markov Decision Arrival Processes",
{\it Journal Computers and Mathematics with Appl.},
third special issue of ``Differential Games", pp. 141-148, 1993.
%
\bibitem{BFS}
M. Bardi, M. Falcone and P. Soravia,
``Fully Discrete Schemes for the Value Function of Pursuit- evation 
Games'',
{\it Annals of dynamic games}, Vol. 1, 1993.
%
\bibitem{BS}
P. Bernhard and J. Shinar,
``On finite Approximation of a Game Solution With Mixed Strategies",
{\it Appl. Math. Lett.} {\bf 3} No. 1, pp. 1-4, 1990.
%
\bibitem{BK}
T. Bewley and E. Kohlberg,
``The asymptotic Theory of Stochastic Games"
{\it Math of Oper. Res.}, Vo; 1 No. 3, 1976.
%
\bibitem{BF}
D. Blackwell and T. Ferguson,
``The Big match",
{\it Annals of Mathematical Statistics}, {\bf 39},
pp. 159-163, 1968.
%
\bibitem{borkar}
V. S. Borkar,
``A convex analytic approach to Markov
decision processes", {\it Probab. Th. Rel. Fields},
Vol. 78, pp. 583-602, 1988.
%
\bibitem{CCb}
R. Cavazos-Cadena,
``Finite-state approximations for denumerable state discounted Markov
Decision Processes",
{\it J. Applied Mathematics and Optimization} {\bf 14} pp.  27-47, 1986.
%
\bibitem{fed}
A. Federgruen,
``On N-Person Stochastic Games with denumerable state space",
{\it Adv. Appl. Prob.} {\bf 10}, pp. 452-471, 1978.
%
\bibitem{GP}
V. A. Gaitsgory and A. A. Pervozvanskii,
``Perturbation Theory for Mathematical Programming Problems",
{\it JOTA}, 389-410, 1986.
%
\bibitem{gillette}
D. Gillette,
``Stochastic games with zero stop probabilities",
M. Dresher, A. W. Tucker, P. Wolfe, eds., Princeton University
Press, Princeton, 1957, pp. 179-187.
%
\bibitem{HLa}
O. Hernandez-Lerma,
``Finite state approximations for denumerable multidimensional  -
state discounted Markov decision processes",
{\it J. Mathematical Analysis and Applications} {\bf 113} pp.
382-389, 1986.
%
\bibitem{HL}
O. Hernandez-Lerma,
{\it Adaptive Control of Markov Processes},  Springer Verlag, 1989.
%
\bibitem{HVW}
A. Hordijk, O. Vrieze and G. Wandrooij,
``Semi-Markov strategies in stochastic games",
{\it Mathematical Centre Report BW  68/76}, 1976.
%
\bibitem{S}
L. S. Shapley,
``Stochastic Games'',
{\it proccedings Nat. Acad. of Science USA}, {\bf 39},
pp. 1095-1100, 1953.
%
\bibitem{nowak}
A. S. Nowak,
``On zero-sum stochastic games with general state space, I",
{\it Probability and Mathematical Statistics} IV, No. 1, pp. 13-32,
1984.
%
\bibitem{nowak3}
A. S. Nowak,
``Approximation Theorems for zero-sum nonstationary
stochastic games"
{\it Proc. of the American Math. Soc.}, vol 92, No 3,
pp. 418-424, 1984.
%
\bibitem{PTid}
O. Pourtallier and M. Tidball,
``A Discrete Scheme of Pursuit-Evasion Games''.
Work in preparation.
%
\bibitem{PT}
O. Pourtallier and B. Tolwinsky,
``Discretization of Isaacs Equation: A Convergence Result'',
Work in preparation.
%
\bibitem{royden}
H. Royden, {\it Real Analysis}, New York: Macmillan, 1963.
%
\bibitem{TG}
M. Tidball and R. L. V. Gonz\'alez,
``Zero Sum Differential Games with Stopping Times. 
Some Results about its Numerical Resolution'',
{\it Annals of dynamic games}, Vol. 1, 1993.
%
\bibitem{TS}
L. C. Thomas and D. Stengos,
``Finite State Approximation Algorithms for Average Cost
Denumerable State Markov Decision Processes",
{\it OR Spectrum}, {\bf 7}, pp. 27-37, 1985.
%
\bibitem{vdv}
J. Van der Wal,
``Successive Approximations for Average Reward Markov Games",
{\it Int. J. of Game Theory},
Vol. 9, issue 1, pp. 13-24.
%
\bibitem{wessels}
J. Wessels,
``Markov Games with unbounded rewards",
{\it Dynamische Optimierung}, M. Sch\"al (editor)
Bonner Mathematische Schriften, Nr. 98, Bonn (1977).
%
\bibitem{white}
D. J. White,
``Finite State Approximations for Denumerable
State Infinite Horizon Discounted Markov Decision
Processes",
{\it J. Mathematical Analysis and Applications} {\bf 74},
pp. 292-295, 1980.
%
\bibitem{W}
D. J. White,
``Finite State Approximations for Denumerable
State Infinite Horizon Discounted Markov Decision
Processes with Unbounded Rewards",
{\it J. Mathematical Analysis and Applications} {\bf 86},
pp. 292-306, 1982.
%
\bibitem{whitt1}
W. Whitt,
``Approximations of Dynamic Programs, I",
{\it Mathematics of Operations Research},
Vol. 3 No. 3, pp. 231-243, 1978.
%
\bibitem{whitt2}
W. Whitt,
``Representation and Approximation of Noncooperative
Sequential Games",
{\it SIAM J. Control and Opt.}, Vol 18 No 1, pp. 33-43, 1980.
\end{thebibliography}
\end{document}
