Skip to content
Snippets Groups Projects
Commit 007213b8 authored by Wolfgang Mulzer's avatar Wolfgang Mulzer
Browse files

Finish a first version.

parent 55fa71de
No related branches found
No related tags found
No related merge requests found
......@@ -257,5 +257,101 @@ on the quality of the heuristic. We achieve a faster algorithm at the cost of th
Third, there is a way to eliminate moves from consideration, without sacrificing the quality
of the solution. This strategy is called \emph{$(\alpha, \beta)$-pruning}. The idea is
as follows:
as follows: suppose we explore the game tree starting from the root $r$, and suppose that
we are currently visiting a max-node $v$. For each node $w$ along the path from $r$ to $v$, we are currently
searching for a best possible move, and we have already processed some children of $w$ and have
a \emph{tentative} value for the best possible score for $w$ (more precisely, this is represented by the
current values of the \texttt{max}- and the \texttt{min}-variables in the
\texttt{max-visit}/\texttt{min-visit} calls from $r$ to $v$). Now, suppose that we have just finished
processing a child of the current max-node $v$, and that this results in increasing the tentative score
of $v$ to $k$. Suppose further that along the path from $r$ to $v$, there is a min-node whose tentative
score is smaller than $k$. Then, we claim that we this means that we can immediately stop our exploration
of $v$ and return to the parent node. The reason is as follows: given that the tentative score for
node $v$ is at least $k$, we know that if the game reaches configuration $v$, Player~1 will certainly
have a move that ensures a final score of at least $k$. However, we know that in a configuration $w$ that
is encountered on the way to configuration $v$, there exists a move for Player~2 that ensures a
score that is less than $k$. Thus, we know that Player~2 can always force a score that is less than $k$,
and hence we will never reach configuration $v$, if Player~2 plays optimally.
To implement this idea, we introduce two additional parameter that are passed along during
the search of the game tree: $\alpha$ and $\beta$. Here, $\alpha$ is the highest possible
score that Player~1 was able to achieve so far, whiel $\beta$ is the lowest posible score that
Player~2 was able to achieve so far. While considering a max-node, we can abort the search
as soon as we find a move whose score is higher than $\beta$, and while considering a min-node,
we can abort as soon as we find a move whose score is lower than $\alpha$. The pseudo-code is as follows:
\begin{verbatim}
// visit a final node
final-visit(v):
// simply return the final score for v
return psi(v)
//visit a max-node
max-visit(v, alpha, beta):
max = -infty
for each child w of v do
if w is a final configuration then
child_score <- final-visit(w)
else
child_score <- min-visit(w, alpha, beta)
if child-score > max then
max <- child-score
// if we have found a move that is better than
// the best move so far, we update alpha
if max > alpha then
alpha <- max
// if the move is better than the best move that
// Player 2 can achieve so far, we abort
if max > beta then
break
return max
//visit a min-node
min-visit(v, alpha, beta):
min = infty
if w is a final configuration then
child_score <- final-visit(w)
else
child_score <- max-visit(w, alpha, beta)
if child-score < min then
min <- child-score
if min < beta then
beta <- min
if min < alpha then
break
return min
\end{verbatim}
If we use $(\alpha, \beta)$-pruning, it does make a difference in which order
the children $w$ of a node $v$ are evaluated. If we investigate the more
promising moves first, it becomes likely that in later stagest we abort the search
for a less favorable move. Thus, $(\alpha, \beta)$-prunign is often combined
with a heuristic that determines the order in which the children in a game tree
are evaluated. In practice, we combine $(\alpha, \beta)$-pruning with a bounded
search depth, in order to make sure that the number of moves under investigation
is not too large.
The techniques described so far represent the state of the art in the late 1990s and
early 2000s. The pinnacle was reached in 19XX, when the chess computer DeepBlue,
constructed by IBM, managed
to win a tournament against a ruling chess champion, Wladimir Kramnik. DeppBlue
had a special hardware to optimize the search of the game tree, and it used a
variant of $(\alpha, \beta)$-pruning that was powered by heuristics that IBM developed
with several chess grandmasters. Furthermore, DeepBlue had large look-up tables with
known game sequences, e.g., standard openings and endgames. The game was very close,
but it was the first time that a computer had decisively beaten a human at chess.
At the time, AI researchers were very satisfied with the success, but harder
and less structured games like Go seemed to be completely out of reach for the
current paradigm.
However, in 2016, a stunning reversal took place: AlphaGo, a computer go program
developed by DeepMind, decisively beat the reigning Go-champion. Unlike with DeepBlue
two decades earlier, the result was very clear. Furthermore, AlphaGo did not rely on
special hardware and did not have extensive lookup-tables and libraries for known game
sequences. Instead, AlphaGo relied on a very simple technique for evaluating the game
tree, called Monte-Carlo-tree-search (\textbf{TODO: add more details)}. The heuristics
were not hardcoded, but obtained in an extensive training phase using \emph{deep learning}.
The victory of AlphaGo was one of the first impressive successes of a new paradigm
in artificial intelligence the we still see today: instead of building intricate
specialized models that try to capture the knowledge of human experts, we use abstract, general-purpose
models that are trained using a massive amount of data. You will learn more about this
development in later classes.
......@@ -2,3 +2,7 @@
\chapter{Conclusion and Outlook}
This concludes our course on algorithms and data structures.
In future classes, you will learn a lot more about these topics
and about other aspects of theoretical computer science.
No preview for this file type
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment