Add more about games.

55fa71de · Wolfgang Mulzer · fd689299 · 55fa71de · 55fa71de
Commit 55fa71de authored 10 months ago by Wolfgang Mulzer
--- a/59-games.tex
+++ b/59-games.tex
@@ -2,3 +2,260 @@
 \chapter{Graphs and Games}
+Graph algorithms can be used to solve simple
+puzzles and to play games. There are many kinds
+of games, and there are many approaches for
+solving them with a computer. We will look at a
+few simple examples to illustrate the ideas, more
+details will be covered in advanced classes on artificial 
+intelligence.
+As a first example we will look at \emph{deterministic}, \emph{one-player}
+games with \emph{perfect information}.
+Here, ``deterministic'' means that there is no randomness involved (i.e., no die is
+cast), and ``perfect information'' means that at each point of the game,
+the player has all the information about the current state (as opposed to,
+e.g., the card game solitaire, where some cards are hidden).
+Typical examples include:
+\begin{itemize}
+	\item \texttt{peg solitaire}: we a given a board with pegs. Initially,
+		there is a single hole. In each move, the player can jump with
+		one peg over an adjacent peg. The peg that was jumped over is removed
+		from the board, creating a new hole. The goal is to find a sequence
+		of moves that results in a  board that contains only a single peg.
+         \item \texttt{Soduko}: we are given a 9x9 grid that contains a few digits between
+		   1 and 9 in its cells . In each move, the player can write a new digit
+		   into a grid cell. The goal is to find a way to put the digits such that
+		   each row, column, and small 3x3 grid contains every digit from 1 to 9 exactly
+		   once.
+	   \item \emph{sliding puzzles}: we are given a board that has the form of a 4x4 grid,
+		   with 15 movable blocks and one hole. In each move, the player can move a
+		   block that is adjacent to the hole into the hole. The blocks contain a picture
+		   that is scrambled. The goal is to find a sequence of moves that reconstitutes
+		   the picture in its original shape.
+         \item \emph{Sokoban}: a warehouse worker needs to shift crates in a labyrinthine warehouse.
+                   In each move, the worker can push a crate in a certain
+		   direction. The goal is to arrange the crates in a given configuration.
+	   \item \emph{FreeCell} a well-known card-game that was shipped with Windows~95.
+\end{itemize}
+We can notice that all these games have a common abstraction: there is a \emph{board} that
+contains all the information about the game, and at each point in time, the board can
+be in a certain \emph{configuration}. The game proceeds in \emph{moves} that are executed by
+the player, affecting the configuration of the board. There goal is to find a 
+sequence of moves that transforms a \emph{starting} configuration into a  \emph{winning}
+configuration.
+With this abstraction in mind, it is easy to interpret these games as a reachability
+problem in graphs: let $G = (V, E)$ be the \emph{game graph}. The nodes of $G$
+are all possible configurations of the board. There is an edge from configuration $v$
+to configuration $w$ if and only if a single move transforms $v$ into $w$. Depending on the
+game, the graph $G$ can be directed or undirected (e.g., in peg solitaire, the graph is directed,
+because a move cannot be undone; in the sliding puzzle, the graph is undirected, because
+the possible block slides are symmetric).
+Now, the problem is as follows: given a game graph $G$ and a starting configuration $s$,
+find a path of moves in $G$ that least to a winning configuration $t$. With our knowledge
+of graph algorithms, this now seems like a very easy task: for example we could use
+BFS, DFS, A*-search, or any other graph search algorithm.
+However, a closer consideration shows that things are not that easy. Up to now,
+when describing our graph algorithm, we have always assumed that the graphs are
+given \emph{explicitly}, as an adjacency list or an adjacency matrix.
+However, in the setting of games, this assumption is no longer realistic. Game
+graphs can be huge, and the time for explicitly generating all vertices and edges
+will be prohibitive. There are several strategies to deal with this:
+First, when executing a graph search, the typical situation is that we are at a current
+node $v$, and that we need to consider all outgoing edges from $v$. So far, this
+was done by simply looking at the adjacency list for $v$ or at the row for $v$ in the
+adjacency matrix. Now, however we need an algorithm to generate the out-neighbors of $v$,
+once $v$ is given. In the context of games, this is usually easy to do: we represent $v$
+in such a way that the corresponding configuration of the board can be deduces easily,
+and then we use our knowledge of the rules of the game to execute all possible moves
+from $v$, generating all the neighbors. Thus, we typically do not need to know the whole
+graph in advance, but we can generate the edges whenever they are needed.
+Second, our graph search algorithms so far have assumed that we can store explicit
+information with all the vertices in the graph (e.g., the \texttt{visited}-attribute
+in BFS or DFS). In a game graph, this is no longer feasible, since the number of possible 
+vertices is just too large. Thus, we need another strategy to maintain this information.
+For example we could use a dictionary to store all the vertices that have actually been
+visited, hoping that the graph search will succeed before too many vertices have been
+explored. This could be combined with an A*-search with a good heuristic, again in the hope
+of reducing the number of vertices that are explored. Another approach would be to give
+up on the attributes altogether, e.g., by doing a variant of DFS that can visit vertices
+multiple times (this is typically called \emph{backtracking}.
+There are many possible heuristics and optimizations that we can try, but in the end
+the underlying problems can be very difficult to solve. The art of finding the right
+heuristic for a given problem lies at the core of the field of artificial intelligence,
+and it requires a lot of experience and creativity. In later classes, you will see more
+of this (and also learn more about complexity theory that tries to explain why these problems
+are so hard).
+Next, we consider a more general class of games: 
+deterministic, \emph{two-player}
+games with perfect information. The main difference now is that there
+are \emph{two} players that play against each other. Again, \emph{deterministic}
+means that there is no randomness (i.e., no die, unlike, say in Mensch-Ärgere-Dich-Nicht) 
+and that the complete state of the game is known to all the players at any point in
+time (unlike, e.g., in UNO). Typical examples of such games are 
+\begin{enumerate}
+	\item Chess,
+	\item Checkers,
+	\item Go,
+	\item Tic-Tac-Toe,
+	\item Vier gewinnt.
+\end{enumerate}
+Again, we can give a common abstraction for all these games:
+as before, there is a board that contains all the information about the game, 
+and at each point in time, the board can
+be in a certain \emph{configuration}. The game proceeds in \emph{moves} that are executed by
+one of the  players, affecting the configuration of the board. The players take
+\emph{turns}, i.e., the moves alternate between Player~1 and Player~2. The information
+which player moves next is part of the configuration.
+There is a fixed \emph{starting configuration} that describes the initial configuration 
+of the board, and we assume that Player~1 moves first (i.e., in the starting configuration,
+it is Player~1's turn).
+There are certain configurations that are designated as \emph{final configurations}. Once
+a final configuration is reached, the game is over. To determine who one, there is
+a function $\Psi$ that assigns an integer to every final configuration. Typically,
+this integer represents the \emph{score} for Player~1: if it is positive, then Player~1 wins,
+if it is negative, then Player~1 loses, and if it is zero, the game has finished in a draw.
+We assume that the game is \emph{zero-sum}, i.e., the score for Player~2 is the negation of
+the score for Player~1.
+These games are typically modelled as \emph{game trees}: the root represents the starting 
+configuration, and it is Player~1's turn. The children of the root are given by all possible
+configurations that can be obtained by a single move of Player~1 in the starting configuration.
+In every such child, it is Player~2's turn. For each node $v$ at the second level, the children
+of $v$ consist of all configurations that result from a single move of Player~2, and in each
+such child, it is Player~1's turn. This continues further down the tree, the layers
+alternating between Player~1 and Player~2. Once a final configuration is reached, there are no
+more children; these are the leaves of the tree.
+In principle, the game tree can be infinite. This can happen if the game allows for
+circular sequences of moves that do not result in any progress. In the following, however,
+we will assume that the game tree is finite, e.g., by imposing a rule that a game is a draw
+if there is no progress within a certain number of moves (e.g., this kind of rule is present
+in chess).
+Now, with the definition of the game tree, we can formally state an algorithmic
+problem that we need to solve: suppose we are given a node $v$ of the game tree where it is the turn
+of Player~$j$. Determine the best possible \emph{score} that Player~$j$ can achieve, \emph{assuming
+that the opponent plays optimally}. Furthermore, determine an \emph{optimal move} that achieves
+this score.
+Some explanations are in order to understand what this means: first, note that since we
+consider zero-sum-games, a best possible score for Player~1 is a score that is \emph{as large
+as possible}, whereas a best possible score for Player~2 is a score that is \emph{as small as possible}.
+In other words, the goal of Player~1 is to \emph{maximize} the score, whereas the goal of Player~2
+is to \emph{minimize} the score. For this reason, the nodes in the game tree where it is the
+turn of Player~1 are called \emph{max-nodes}, and they are represented by upward triangles;
+whereas the nodes where it is the turn of Player~2 are called \emph{min-nodes}, and they
+are represented by downward triangles. Second, let us explain the notion of a ``best possible
+score''. Suppose we are in a node $v$, and it is the turn of Player~1. Then, Player~1 can
+\emph{achieve} score $k$ in $v$ if and only if there is a move in node $v$ that leads into a node
+$v_2$ such that no matter which move Player~2 chooses in node $v_2$, there is always a
+counter-move for Player~1 (depending on Player~2) for which Player~1 can achieve \emph{at least} score $k$.
+Unrolling the definition, this means that \emph{there exists} a child $v_2$ of $v$ such that 
+\emph{for every} child $v_3$ of $v_2$, \emph{there exists} a child $v_4$ of $v_3$, such that \emph{for every}
+child $v_5$ of $v_4$, \emph{there exists} a child $v_6$, etc, such that eventually every such
+sequence of configuration ends up in a final configuration with score at least $k$.
+For Player~2, the definition is similar, the only difference being that the final score should be
+\emph{at most} $k$. Now, the best possible score for node $v$ is the best possible score that
+the current player can achieve at node $v$.
+After this discussion, we can now derive a simple recursive algorithm that determines the 
+best possible score that can be achieved for a given node $v$ in the tree.
+The recursion is very simple: first, suppose that $v$ represents a final configuration. Then,
+the best possible score for $v$ is given by the final score $\Psi(v)$ for $v$.
+Next, suppose that $v$ is a max-node. This means that it is the turn of Player~1, and the
+goal is to achieve a score that is as large as possible. Let $w_1, \dots, w_j$ be the children
+of $v$. All these children are min-nodes, where it is the turn of Player~2. Suppose that for
+each child $w_1, \dots, w_j$, we can recursively compute the lowest possible 
+score $s_1, \dots, s_j$
+that Player~2 can achieve when playing from this child. Then, the best score that Player~1 can
+achieve from $v$ is by making a move that maximizes this score, i.e., the move that is as bad for Player~2
+as possible. Similarly, if $v$ is a min-node, it is the turn of Player~2, and the goal is to
+achieve a score that is as small as possible. For each child of $v$, we can recursively
+determine the largest possible score that Player~1 can achieve from this child, and we pick
+the child that minimizes the score. The pseudocode for this algorithm is as follows:
+\begin{verbatim}
+// visit a final node
+final-visit(v):
+    // simply return the final score for v
+    return psi(v) 
+//visit a max-node
+max-visit(v):
+    max = -infty
+    // for each child w, determine the lowest possible score
+    // that Player 2 can achieve from w, and pick the child
+    // where this lowest possible score is as large as possible
+    for each child w of v do
+        if w is a final configuration then
+            child_score <- final-visit(w)
+        else
+            child_score <- min-visit(w)
+        if child-score > max then
+            max <- child-score
+    return max
+//visit a min-node
+min-visit(v):
+    min = infty
+    // for each child w, determine the largeest possible score
+    // that Player 1 can achieve from w, and pick the child
+    // where this largest possible score is as small as possible
+    for each child w of v do
+        if w is a final configuration then
+            child_score <- final-visit(w)
+        else
+            child_score <- max-visit(w)
+        if child-score < min then
+            min <- child-score
+    return min
+\end{verbatim}
+This algorithm is called the \emph{minimax-algorithm}. It constitutes
+of a simple post-order traversal of the tree and determines for each possible
+configuration in the game tree the optimal score. Given the optimal scores, it
+is also easy to determine the best possible move for each given configuration.
+As in the single-player case, this solves the problem completely, except for the
+fact that the game trees for realistic games are prohibitively large, so that
+it is far from feasible the execute the minimax-algorithm in its entirety.
+Again, there is a whole array of possible tricks that we can use to improve
+the running-time of the minimax-algorithm and to reduce the size of the search
+space. We list a few such tricks:
+First, we can try to avoid unnecessary duplication of work: it can happen that certain
+configurations are repeated throughout the game tree,because the same configuration
+can be reached by different sequences of moves from the starting configuration. Instead
+of recomputing the optimal score for each such repetition from scratch, we can maintain
+a dictionary with all the configurations that we have processed so far, computing the
+optimal scores only for new configurations. Going even further, there may be \emph{symmetries}
+between configurations, i.e., there may be configurations that look different superficially,
+but that are essentially the same (e.g., they can be obtained from each other by mirroring the
+board). By checking for such symmetries, and by avoiding a recomputation if possible, we
+can further reduce the number of distinct configurations that need to be processed.
+Second, we can try to limit the search depth. Instead of searching the whole game tree
+until a final configuration is reached, we can cut off the search after a certain (pre-determined)
+number of moves. Every configuration that is reached after a certain number of moves is
+treated as a final configuration in the minimax-algorithm. The obvious problem is now, 
+we do not have a final score for these configurations. Instead, we need to introduce
+a \emph{heuristic function} that assigns to each possible configuration of the board
+a value that can be used as an estimator for the final score. Instead of the final score,
+we use the heuristic score, and the minimax-algorithm only provides a way to reach a configuration
+that achieves the best possible heuristic score for the current player. This means in particular
+that the minimax-algorithm is no longer guaranteed to be optimal; the quality of the result depends
+on the quality of the heuristic. We achieve a faster algorithm at the cost of the quality of the solution.
+Third, there is a way to eliminate moves from consideration, without sacrificing the quality 
+of the solution. This strategy is called \emph{$(\alpha, \beta)$-pruning}. The idea is
+as follows:
--- a/skript.pdf
+++ b/skript.pdf