Finish a first version.

007213b8 · Wolfgang Mulzer · 55fa71de · 007213b8 · 007213b8 · 007213b8
Commit 007213b8 authored 10 months ago by Wolfgang Mulzer
--- a/59-games.tex
+++ b/59-games.tex
@@ -257,5 +257,101 @@ on the quality of the heuristic. We achieve a faster algorithm at the cost of th
 Third, there is a way to eliminate moves from consideration, without sacrificing the quality 
 of the solution. This strategy is called \emph{$(\alpha, \beta)$-pruning}. The idea is
-as follows:
+as follows: suppose we explore the game tree starting from the root $r$, and suppose that
+we are currently visiting a max-node $v$. For each node $w$ along the path from $r$ to $v$, we are currently
+searching for a best possible move, and we have already processed some children of $w$ and have
+a \emph{tentative} value for the best possible score for $w$ (more precisely, this is represented by the
+current values of the \texttt{max}- and the \texttt{min}-variables in the 
+\texttt{max-visit}/\texttt{min-visit} calls from $r$ to $v$). Now, suppose that we have just finished
+processing a child of the current max-node $v$, and that this results in increasing the tentative score
+of $v$ to $k$. Suppose further that along the path from $r$ to $v$, there is a min-node whose tentative
+score is smaller than $k$. Then, we claim that we this means that we can immediately stop our exploration
+of $v$ and return to the parent node. The reason is as follows: given that the tentative score for
+node $v$ is at least $k$, we know that if the game reaches configuration $v$, Player~1 will certainly
+have a move that ensures a final score of at least $k$. However, we know that in a configuration $w$ that
+is encountered on the way to configuration $v$, there exists a move for Player~2 that ensures a
+score that is less than $k$. Thus, we know that Player~2 can always force a score that is less than $k$,
+and hence we will never reach configuration $v$, if Player~2 plays optimally.
+To implement this idea, we introduce two additional parameter that are passed along during
+the search of the game tree: $\alpha$ and $\beta$. Here, $\alpha$ is the highest possible
+score that Player~1 was able to achieve so far, whiel $\beta$ is the lowest posible score that
+Player~2 was able to achieve so far. While considering a max-node, we can abort the search
+as soon as we find a move whose score is higher than $\beta$, and while considering a min-node,
+we can abort as soon as we find a move whose score is lower than $\alpha$. The pseudo-code is as follows:
+\begin{verbatim}
+// visit a final node
+final-visit(v):
+    // simply return the final score for v
+    return psi(v) 
+//visit a max-node
+max-visit(v, alpha, beta):
+    max = -infty
+    for each child w of v do
+        if w is a final configuration then
+            child_score <- final-visit(w)
+        else
+            child_score <- min-visit(w, alpha, beta)
+        if child-score > max then
+            max <- child-score
+            // if we have found a move that is better than
+            // the best move so far, we update alpha
+            if max > alpha then
+                alpha <- max
+        // if the move is better than the best move that
+        // Player 2 can achieve so far, we abort
+        if max > beta then
+           break
+    return max
+//visit a min-node
+min-visit(v, alpha, beta):
+    min = infty
+        if w is a final configuration then
+            child_score <- final-visit(w)
+        else
+            child_score <- max-visit(w, alpha, beta)
+        if child-score < min then
+            min <- child-score
+            if min < beta then
+                beta <- min
+        if min < alpha then
+           break
+    return min
+\end{verbatim}
+If we use $(\alpha, \beta)$-pruning, it does make a difference in which order
+the children $w$ of a node $v$ are evaluated. If we investigate the more
+promising moves first, it becomes likely that in later stagest we abort the search
+for a less favorable move. Thus, $(\alpha, \beta)$-prunign is often combined
+with a heuristic that determines the order in which the children in a game tree
+are evaluated. In practice, we combine $(\alpha, \beta)$-pruning with a bounded
+search depth, in order to make sure that the number of moves under investigation
+is not too large.
+The techniques described so far represent the state of the art in the late 1990s and
+early 2000s. The pinnacle was reached in 19XX, when the chess computer DeepBlue,
+constructed by IBM, managed
+to win a tournament against a ruling chess champion, Wladimir Kramnik. DeppBlue
+had a special hardware to optimize the search of the game tree, and it used a
+variant of $(\alpha, \beta)$-pruning that was powered by heuristics that IBM developed
+with several chess grandmasters. Furthermore, DeepBlue had large look-up tables with
+known game sequences, e.g., standard openings and endgames. The game was very close,
+but it was the first time that a computer had decisively beaten a human at chess.
+At the time, AI researchers were very satisfied with the success, but harder
+and less structured games like Go seemed to be completely out of reach for the 
+current paradigm.
+However, in 2016, a stunning reversal took place: AlphaGo, a computer go program
+developed by DeepMind, decisively beat the reigning Go-champion. Unlike with DeepBlue
+two decades earlier, the result was very clear. Furthermore, AlphaGo did not rely on 
+special hardware and did not have extensive lookup-tables and libraries for known game
+sequences. Instead, AlphaGo relied on a very simple technique for evaluating the game
+tree, called Monte-Carlo-tree-search (\textbf{TODO: add more details)}. The heuristics 
+were not hardcoded, but obtained in an extensive training phase using \emph{deep learning}. 
+The victory of AlphaGo was one of the first impressive  successes of a new paradigm
+in artificial intelligence the we still see today: instead of building intricate
+specialized models that try to capture the knowledge of human experts, we use abstract, general-purpose
+models that are trained using a massive amount of data. You will learn more about this
+development in later classes.
--- a/61-conclusion.tex
+++ b/61-conclusion.tex
@@ -2,3 +2,7 @@
 \chapter{Conclusion and Outlook}
+This concludes our course on algorithms and data structures.
+In future classes, you will learn a lot more about these topics
+and about other aspects of theoretical computer science.
--- a/skript.pdf
+++ b/skript.pdf