diff --git a/59-games.tex b/59-games.tex index c77c39a591e197e167d478cd13298b28bfa21eda..07f39a4e021118a53a3a67be8604357db2108b6e 100644 --- a/59-games.tex +++ b/59-games.tex @@ -257,5 +257,101 @@ on the quality of the heuristic. We achieve a faster algorithm at the cost of th Third, there is a way to eliminate moves from consideration, without sacrificing the quality of the solution. This strategy is called \emph{$(\alpha, \beta)$-pruning}. The idea is -as follows: +as follows: suppose we explore the game tree starting from the root $r$, and suppose that +we are currently visiting a max-node $v$. For each node $w$ along the path from $r$ to $v$, we are currently +searching for a best possible move, and we have already processed some children of $w$ and have +a \emph{tentative} value for the best possible score for $w$ (more precisely, this is represented by the +current values of the \texttt{max}- and the \texttt{min}-variables in the +\texttt{max-visit}/\texttt{min-visit} calls from $r$ to $v$). Now, suppose that we have just finished +processing a child of the current max-node $v$, and that this results in increasing the tentative score +of $v$ to $k$. Suppose further that along the path from $r$ to $v$, there is a min-node whose tentative +score is smaller than $k$. Then, we claim that we this means that we can immediately stop our exploration +of $v$ and return to the parent node. The reason is as follows: given that the tentative score for +node $v$ is at least $k$, we know that if the game reaches configuration $v$, Player~1 will certainly +have a move that ensures a final score of at least $k$. However, we know that in a configuration $w$ that +is encountered on the way to configuration $v$, there exists a move for Player~2 that ensures a +score that is less than $k$. Thus, we know that Player~2 can always force a score that is less than $k$, +and hence we will never reach configuration $v$, if Player~2 plays optimally. +To implement this idea, we introduce two additional parameter that are passed along during +the search of the game tree: $\alpha$ and $\beta$. Here, $\alpha$ is the highest possible +score that Player~1 was able to achieve so far, whiel $\beta$ is the lowest posible score that +Player~2 was able to achieve so far. While considering a max-node, we can abort the search +as soon as we find a move whose score is higher than $\beta$, and while considering a min-node, +we can abort as soon as we find a move whose score is lower than $\alpha$. The pseudo-code is as follows: +\begin{verbatim} +// visit a final node +final-visit(v): + // simply return the final score for v + return psi(v) + + +//visit a max-node +max-visit(v, alpha, beta): + max = -infty + for each child w of v do + if w is a final configuration then + child_score <- final-visit(w) + else + child_score <- min-visit(w, alpha, beta) + if child-score > max then + max <- child-score + // if we have found a move that is better than + // the best move so far, we update alpha + if max > alpha then + alpha <- max + // if the move is better than the best move that + // Player 2 can achieve so far, we abort + if max > beta then + break + return max + +//visit a min-node +min-visit(v, alpha, beta): + min = infty + if w is a final configuration then + child_score <- final-visit(w) + else + child_score <- max-visit(w, alpha, beta) + if child-score < min then + min <- child-score + if min < beta then + beta <- min + if min < alpha then + break + return min +\end{verbatim} +If we use $(\alpha, \beta)$-pruning, it does make a difference in which order +the children $w$ of a node $v$ are evaluated. If we investigate the more +promising moves first, it becomes likely that in later stagest we abort the search +for a less favorable move. Thus, $(\alpha, \beta)$-prunign is often combined +with a heuristic that determines the order in which the children in a game tree +are evaluated. In practice, we combine $(\alpha, \beta)$-pruning with a bounded +search depth, in order to make sure that the number of moves under investigation +is not too large. + +The techniques described so far represent the state of the art in the late 1990s and +early 2000s. The pinnacle was reached in 19XX, when the chess computer DeepBlue, +constructed by IBM, managed +to win a tournament against a ruling chess champion, Wladimir Kramnik. DeppBlue +had a special hardware to optimize the search of the game tree, and it used a +variant of $(\alpha, \beta)$-pruning that was powered by heuristics that IBM developed +with several chess grandmasters. Furthermore, DeepBlue had large look-up tables with +known game sequences, e.g., standard openings and endgames. The game was very close, +but it was the first time that a computer had decisively beaten a human at chess. +At the time, AI researchers were very satisfied with the success, but harder +and less structured games like Go seemed to be completely out of reach for the +current paradigm. + +However, in 2016, a stunning reversal took place: AlphaGo, a computer go program +developed by DeepMind, decisively beat the reigning Go-champion. Unlike with DeepBlue +two decades earlier, the result was very clear. Furthermore, AlphaGo did not rely on +special hardware and did not have extensive lookup-tables and libraries for known game +sequences. Instead, AlphaGo relied on a very simple technique for evaluating the game +tree, called Monte-Carlo-tree-search (\textbf{TODO: add more details)}. The heuristics +were not hardcoded, but obtained in an extensive training phase using \emph{deep learning}. +The victory of AlphaGo was one of the first impressive successes of a new paradigm +in artificial intelligence the we still see today: instead of building intricate +specialized models that try to capture the knowledge of human experts, we use abstract, general-purpose +models that are trained using a massive amount of data. You will learn more about this +development in later classes. diff --git a/61-conclusion.tex b/61-conclusion.tex index cdfbac9b0f44fed29412d83bd1e9e75e34d83d4f..6490762cdc1d5f740ba88ae0d0bbb5d4172297fe 100644 --- a/61-conclusion.tex +++ b/61-conclusion.tex @@ -2,3 +2,7 @@ \chapter{Conclusion and Outlook} +This concludes our course on algorithms and data structures. +In future classes, you will learn a lot more about these topics +and about other aspects of theoretical computer science. + diff --git a/skript.pdf b/skript.pdf index 2bfc7ce14608b8b2de9f4f6c2ff4a4c60616e997..147680244e7fb253b96d30203c8f08ade9d8050f 100644 Binary files a/skript.pdf and b/skript.pdf differ