Stochastic game

In game theory, a stochastic game (or Markov game) is a repeated game with probabilistic transitions played by one or more players. The game is played in a sequence of stages. At the beginning of each stage the game is in some state. The players select actions and each player receives a payoff that depends on the current state and the chosen actions. The game then moves to a new random state whose distribution depends on the previous state and the actions chosen by the players. The procedure is repeated at the new state and play continues for a finite or infinite number of stages. The total payoff to a player is often taken to be the discounted sum of the stage payoffs or the limit inferior of the averages of the stage payoffs.

Stochastic games were introduced by Lloyd Shapley in the early 1950s.^[1] They generalize Markov decision processes to multiple interacting decision makers, as well as strategic-form games to dynamic situations in which the environment changes in response to the players' choices.^[2]

Two-player games

Stochastic two-player games on directed graphs are widely used for modeling and analysis of discrete systems operating in an unknown (adversarial) environment^{[citation needed]}. Possible configurations of a system and its environment are represented as vertices, and the transitions correspond to actions of the system, its environment, or "nature". A run of the system then corresponds to an infinite path in the graph. Thus, a system and its environment can be seen as two players with antagonistic objectives, where one player (the system) aims at maximizing the probability of "good" runs, while the other player (the environment) aims at the opposite.

In many cases, there exists an equilibrium value of this probability, but optimal strategies for both players may not exist.

We introduce basic concepts and algorithmic questions studied in this area, and we mention some long-standing open problems. Then, we mention selected recent results.

Theory

The ingredients of a stochastic game are: a finite set of players $I$ ; a state space $S$ (either a finite set or a measurable space $(S,{\mathcal {S}})$ ); for each player $i\in I$ , an action set $A^{i}$ (either a finite set or a measurable space $(A^{i},{\mathcal {A}}^{i})$ ); a transition probability $P$ from $S\times A$ , where $A=\times _{i\in I}A^{i}$ is the action profiles, to $S$ , where $P(S\mid s,a)$ is the probability that the next state is in $S$ given the current state $s$ and the current action profile $a$ ; and a payoff function $g$ from $S\times A$ to $R^{I}$ , where the $i$ -th coordinate of $g$ , $g^{i}$ , is the payoff to player $i$ as a function of the state $s$ and the action profile $a$ .

The game starts at some initial state $s_{1}$ . At stage $t$ , players first observe $s_{t}$ , then simultaneously choose actions $a_{t}^{i}\in A^{i}$ , then observe the action profile $a_{t}=(a_{t}^{i})_{i}$ , and then nature selects $s_{t+1}$ according to the probability $P(\cdot \mid s_{t},a_{t})$ . A play of the stochastic game, $s_{1},a_{1},\ldots ,s_{t},a_{t},\ldots$ , defines a stream of payoffs $g_{1},g_{2},\ldots$ , where $g_{t}=g(s_{t},a_{t})$ .

The discounted game $\Gamma _{\lambda }$ with discount factor $\lambda$ ( $0<\lambda \leq 1$ ) is the game where the payoff to player $i$ is $\lambda \sum _{t=1}^{\infty }(1-\lambda )^{t-1}g_{t}^{i}$ . The $n$ -stage game is the game where the payoff to player $i$ is ${\bar {g}}_{n}^{i}:={\frac {1}{n}}\sum _{t=1}^{n}g_{t}^{i}$ .

The value $v_{n}(s_{1})$ , respectively $v_{\lambda }(s_{1})$ , of a two-person zero-sum stochastic game $\Gamma _{n}$ , respectively $\Gamma _{\lambda }$ , with finitely many states and actions exists, and Truman Bewley and Elon Kohlberg (1976) proved that $v_{n}(s_{1})$ converges to a limit as $n$ goes to infinity and that $v_{\lambda }(s_{1})$ converges to the same limit as $\lambda$ goes to $0$ .

The "undiscounted" game $\Gamma _{\infty }$ is the game where the payoff to player $i$ is the "limit" of the averages of the stage payoffs. Some precautions are needed in defining the value of a two-person zero-sum $\Gamma _{\infty }$ and in defining equilibrium payoffs of a non-zero-sum $\Gamma _{\infty }$ . The uniform value $v_{\infty }$ of a two-person zero-sum stochastic game $\Gamma _{\infty }$ exists if for every $\varepsilon >0$ there is a positive integer $N$ and a strategy pair $\sigma _{\varepsilon }$ of player 1 and $\tau _{\varepsilon }$ of player 2 such that for every $\sigma$ and $\tau$ and every $n\geq N$ the expectation of ${\bar {g}}_{n}^{i}$ with respect to the probability on plays defined by $\sigma _{\varepsilon }$ and $\tau$ is at least $v_{\infty }-\varepsilon$ , and the expectation of ${\bar {g}}_{n}^{i}$ with respect to the probability on plays defined by $\sigma$ and $\tau _{\varepsilon }$ is at most $v_{\infty }+\varepsilon$ . Jean-François Mertens and Abraham Neyman (1981) proved that every two-person zero-sum stochastic game with finitely many states and actions has a uniform value.^[3]

If there is a finite number of players and the action sets and the set of states are finite, then a stochastic game with a finite number of stages always has a Nash equilibrium. The same is true for a game with infinitely many stages if the total payoff is the discounted sum.

The non-zero-sum stochastic game $\Gamma _{\infty }$ has a uniform equilibrium payoff $v_{\infty }$ if for every $\varepsilon >0$ there is a positive integer $N$ and a strategy profile $\sigma$ such that for every unilateral deviation by a player $i$ , i.e., a strategy profile $\tau$ with $\sigma ^{j}=\tau ^{j}$ for all $j\neq i$ , and every $n\geq N$ the expectation of ${\bar {g}}_{n}^{i}$ with respect to the probability on plays defined by $\sigma$ is at least $v_{\infty }^{i}-\varepsilon$ , and the expectation of ${\bar {g}}_{n}^{i}$ with respect to the probability on plays defined by $\tau$ is at most $v_{\infty }^{i}+\varepsilon$ . Nicolas Vieille has shown that all two-person stochastic games with finite state and action spaces have a uniform equilibrium payoff.^[4]

The non-zero-sum stochastic game $\Gamma _{\infty }$ has a limiting-average equilibrium payoff $v_{\infty }$ if for every $\varepsilon >0$ there is a strategy profile $\sigma$ such that for every unilateral deviation by a player $i$ , the expectation of the limit inferior of the averages of the stage payoffs with respect to the probability on plays defined by $\sigma$ is at least $v_{\infty }^{i}-\varepsilon$ , and the expectation of the limit superior of the averages of the stage payoffs with respect to the probability on plays defined by $\tau$ is at most $v_{\infty }^{i}+\varepsilon$ . Jean-François Mertens and Abraham Neyman (1981) proves that every two-person zero-sum stochastic game with finitely many states and actions has a limiting-average value,^[3] and Nicolas Vieille has shown that all two-person stochastic games with finite state and action spaces have a limiting-average equilibrium payoff.^[4] In particular, these results imply that these games have a value and an approximate equilibrium payoff, called the liminf-average (respectively, the limsup-average) equilibrium payoff, when the total payoff is the limit inferior (or the limit superior) of the averages of the stage payoffs.

Whether every stochastic game with finitely many players, states, and actions, has a uniform equilibrium payoff, or a limiting-average equilibrium payoff, or even a liminf-average equilibrium payoff, is a challenging open question.

A Markov perfect equilibrium is a refinement of the concept of sub-game perfect Nash equilibrium to stochastic games.

Stochastic games have been combined with Bayesian games to model uncertainty over player strategies.^[5] The resulting stochastic Bayesian game model is solved via a recursive combination of the Bayesian Nash equilibrium equation and the Bellman optimality equation.

Stopping games

E. B. Dynkin^[6] presented the following problem in game theory related to stopping of stochastic processes. Suppose ${\mathcal {F}}_{n},n=0,1,\ldots$ , be an increasing sequence of σ-algebras in some probability space $(\Omega ,{\mathcal {F}},{\mathbf {P} })$ , where ${\mathcal {F}}$ is containing all the ${\mathcal {F}}_{n}$ . Two players observe stochastic sequences $\{X_{n}\}_{n=1}^{\infty }$ , $\{\varPhi _{n}\}_{n=1}^{\infty }$ , i.e. functions measurable with respect to $\{{\mathcal {F}}_{n}\}_{n=0}^{\infty }$ . A game can be stopped at time n by the first player if $\varPhi _{n}\geq 0$ and by the second player if $\varPhi _{n}<0$ . If the game is stopped at time n, then the first player receives from the second player $x_{n}$ . Player 1 seeks to maximize the expected payoff, and player 2 seeks to minimize it.

Let $\tau$ be Markov time with respect to the filtration $\{{\mathcal {F}}_{n}\}_{n=0}^{\infty }$ and $\chi _{A}$ be the characteristic function of the event $A$ . Denote ${\mathcal {T}}$ the set of all stopping times with respect to the filtration $\{{\mathcal {F}}_{n}\}_{n=0}^{\infty }$ , ${\textstyle \Lambda =\{\lambda =\tau \chi _{\varPhi _{\tau }\geqslant 0},\tau \in {\mathcal {T}}\}}$ , and ${\textstyle \mathrm {M} =\{\mu =\tau \chi _{\varPhi _{\tau }<0},\tau \in {\mathcal {T}}\}}$ . Let $\lambda$ be the stopping time selected by the first ( $\mu$ resp. by the second) player. The payoff, payment of the second player to the first, is defined then $R(\lambda ,\mu )={\mathbf {E} }X_{\lambda \land \mu }$ .

Under the condition that ${\mathbf {E} }(\sup _{n}|X_{n}|)<\infty$ , Dynkin^[6] proved that the value of the game $v=\sup _{\lambda \in \Lambda }\inf _{\mu \in \mathrm {M} }R(\lambda ,\mu )$ exists. He constructed ε-optimal strategies, and introduced several conditions for the existence of optimal strategies (for extension see Neveu^[7] and Yasuda^[8] ).

Applications

Stochastic games have applications in economics, evolutionary biology and computer networks.^[9]^[10] They are generalizations of repeated games which correspond to the special case where there is only one state.

Notes

^ Shapley, L. S. (1953). "Stochastic games". PNAS. 39 (10): 1095–1100. Bibcode:1953PNAS...39.1095S. doi:10.1073/pnas.39.10.1095. PMC 1063912. PMID 16589380.
^ Solan, Eilon; Vieille, Nicolas (2015). "Stochastic Games". PNAS. 112 (45): 13743–13746. doi:10.1073/pnas.1513508112. PMC 4653174. PMID 26556883.
^ ^a ^b Mertens, J. F. & Neyman, A. (1981). "Stochastic Games". International Journal of Game Theory. 10 (2): 53–66. doi:10.1007/BF01769259. S2CID 189830419.
^ ^a ^b Vieille, N. (2002). "Stochastic games: Recent results". Handbook of Game Theory. Amsterdam: Elsevier Science. pp. 1833–1850. ISBN 0-444-88098-4.
^ Albrecht, Stefano; Crandall, Jacob; Ramamoorthy, Subramanian (2016). "Belief and Truth in Hypothesised Behaviours". Artificial Intelligence. 235: 63–94. arXiv:1507.07688. doi:10.1016/j.artint.2016.02.004. S2CID 2599762.
^ ^a ^b Dynkin, E.B. (1969). "A game-theoretic version of an optimal stopping problem" (PDF). Dokl. Akad. Nauk SSSR. 185 (1): 16–19 – via ru.
^ Neveu, J. (1975). Discrete-parameter martingales (in fr;en) (North-Holland Mathematical Library, Vol. 10 ed.). Amsterdam: Oxford: North- Holland Publishing Company; New York: American Elsevier Publishing Company, Inc. pp. viii+236, Chapter 3.{{cite book}}: CS1 maint: unrecognized language (link)
^ Yasuda, M. (1985-12-01). "On a randomized strategy in Neveu's stopping problem". Stochastic Processes and their Applications. 21 (1): 159–166. doi:10.1016/0304-4149(85)90384-9. ISSN 0304-4149.
^ Constrained Stochastic Games in Wireless Networks by E.Altman, K.Avratchenkov, N.Bonneau, M.Debbah, R.El-Azouzi, D.S.Menasche
^ Djehiche, Boualem; Tcheukam, Alain; Tembine, Hamidou (2017-09-27). "Mean-Field-Type Games in Engineering". AIMS Electronics and Electrical Engineering. 1: 18–73. arXiv:1605.03281. doi:10.3934/ElectrEng.2017.1.18. S2CID 16055840.

External links

Lecture on Stochastic Two-Player Games by Antonin Kucera

[1] Shapley, L. S. (1953). "Stochastic games". PNAS. 39 (10): 1095–1100. Bibcode:1953PNAS...39.1095S. doi:10.1073/pnas.39.10.1095. PMC 1063912. PMID 16589380.

[2] Solan, Eilon; Vieille, Nicolas (2015). "Stochastic Games". PNAS. 112 (45): 13743–13746. doi:10.1073/pnas.1513508112. PMC 4653174. PMID 26556883.

[MertensNeyman-3] Mertens, J. F. & Neyman, A. (1981). "Stochastic Games". International Journal of Game Theory. 10 (2): 53–66. doi:10.1007/BF01769259. S2CID 189830419.

[Vieille-4] Vieille, N. (2002). "Stochastic games: Recent results". Handbook of Game Theory. Amsterdam: Elsevier Science. pp. 1833–1850. ISBN 0-444-88098-4.

[5] Albrecht, Stefano; Crandall, Jacob; Ramamoorthy, Subramanian (2016). "Belief and Truth in Hypothesised Behaviours". Artificial Intelligence. 235: 63–94. arXiv:1507.07688. doi:10.1016/j.artint.2016.02.004. S2CID 2599762.

[:0-6] Dynkin, E.B. (1969). "A game-theoretic version of an optimal stopping problem" (PDF). Dokl. Akad. Nauk SSSR. 185 (1): 16–19 – via ru.

[7] Neveu, J. (1975). Discrete-parameter martingales (in fr;en) (North-Holland Mathematical Library, Vol. 10 ed.). Amsterdam: Oxford: North- Holland Publishing Company; New York: American Elsevier Publishing Company, Inc. pp. viii+236, Chapter 3.{{cite book}}: CS1 maint: unrecognized language (link)

[8] Yasuda, M. (1985-12-01). "On a randomized strategy in Neveu's stopping problem". Stochastic Processes and their Applications. 21 (1): 159–166. doi:10.1016/0304-4149(85)90384-9. ISSN 0304-4149.

[9] Constrained Stochastic Games in Wireless Networks by E.Altman, K.Avratchenkov, N.Bonneau, M.Debbah, R.El-Azouzi, D.S.Menasche

[10] Djehiche, Boualem; Tcheukam, Alain; Tembine, Hamidou (2017-09-27). "Mean-Field-Type Games in Engineering". AIMS Electronics and Electrical Engineering. 1: 18–73. arXiv:1605.03281. doi:10.3934/ElectrEng.2017.1.18. S2CID 16055840.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

v t e Topics of game theory
Definitions	Congestion game Cooperative game Determinacy Escalation of commitment Extensive-form game First-player and second-player win Game complexity Graphical game Hierarchy of beliefs Information set Normal-form game Perfect recall Preference Sequential game Simultaneous game Simultaneous action selection Solved game Succinct game Mechanism design
Equilibrium concepts	Bayes correlated equilibrium Bayesian Nash equilibrium Berge equilibrium Core Correlated equilibrium Coalition-proof Nash equilibrium Epsilon-equilibrium Evolutionarily stable strategy Gibbs equilibrium Mertens-stable equilibrium Markov perfect equilibrium Nash equilibrium Pareto efficiency Perfect Bayesian equilibrium Proper equilibrium Quantal response equilibrium Quasi-perfect equilibrium Risk dominance Satisfaction equilibrium Self-confirming equilibrium Sequential equilibrium Shapley value Strong Nash equilibrium Subgame perfection Trembling hand equilibrium
Strategies	Appeasement Backward induction Bid shading Collusion Cheap talk De-escalation Deterrence Escalation Forward induction Grim trigger Markov strategy Pairing strategy Dominant strategies Pure strategy Mixed strategy Strategy-stealing argument Tit for tat
Classes of games	Auction Bargaining problem Differential game Global game Intransitive game Mean-field game n-player game Perfect information Large Poisson game Potential game Repeated game Screening game Signaling game Strictly determined game Stochastic game Symmetric game Zero-sum game
Games	Go Chess Infinite chess Checkers All-pay auction Prisoner's dilemma Gift-exchange game Optional prisoner's dilemma Traveler's dilemma Coordination game Chicken Centipede game Lewis signaling game Volunteer's dilemma Dollar auction Battle of the sexes Stag hunt Matching pennies Ultimatum game Electronic mail game Rock paper scissors Pirate game Dictator game Public goods game Blotto game War of attrition El Farol Bar problem Fair division Fair cake-cutting Bertrand competition Cournot competition Stackelberg competition Deadlock Diner's dilemma Guess 2/3 of the average Kuhn poker Nash bargaining game Induction puzzles Trust game Princess and monster game Rendezvous problem Pursuit game
Theorems	Aumann's agreement theorem Folk theorem Minimax theorem Nash's theorem Negamax theorem One-shot deviation principle Purification theorem Revelation principle Sprague–Grundy theorem Zermelo's theorem
Key figures	Albert W. Tucker Amos Tversky Antoine Augustin Cournot Ariel Rubinstein Claude Shannon Daniel Kahneman David K. Levine David M. Kreps Donald B. Gillies Drew Fudenberg Eric Maskin Harold W. Kuhn Herbert Simon Hervé Moulin John Conway Jean Tirole Jean-François Mertens Jennifer Tour Chayes John Harsanyi John Maynard Smith John Nash John von Neumann Kenneth Arrow Kenneth Binmore Leonid Hurwicz Lloyd Shapley Melvin Dresher Merrill M. Flood Olga Bondareva Oskar Morgenstern Paul Milgrom Peyton Young Reinhard Selten Robert Axelrod Robert Aumann Robert B. Wilson Roger Myerson Samuel Bowles Suzanne Scotchmer Thomas Schelling William Vickrey
Search optimizations	Alpha–beta pruning Aspiration window Principal variation search max^n algorithm Paranoid algorithm Lazy SMP
Miscellaneous	Bounded rationality Combinatorial game theory Confrontation analysis Coopetition Evolutionary game theory Glossary of game theory List of game theorists List of games in game theory No-win situation Topological game Tragedy of the commons