Skip to content
Snippets Groups Projects
Commit 78ea1b4f authored by Wolfgang Mulzer's avatar Wolfgang Mulzer
Browse files

More on Bellman-Ford.

parent 658bc7c5
No related branches found
No related tags found
No related merge requests found
......@@ -103,7 +103,7 @@ videos, etc. Depending on the type of data, different approaches are possible:
In this class, we will look closer at Huffman-codes, a data compression method that exploits the first idea:
varying the coding length. The other methods are treated in later classes on data compression.
\paragraph{Prefix-fee codes and Huffman codes.}
\paragraph{Prefix-free codes and Huffman codes.}
Before discussing Huffman-codes, we first need some basic definitions and facts about
codes. Let $\Sigma$ be an alphabet, such that $\Sigma$ contains at least two symbols.\footnote{If
$\Sigma$ has only one symbol, data compression is not too interesting, since then we could represent
......
......@@ -312,10 +312,10 @@ Now, let $A = \sum_{j=0}^{\ell-1} \left(\sigma_{i+j} - \tau_{1 + j} \right) |\Si
Then, $A$ is a fixed number between $-|\Sigma|^\ell$ and $|\Sigma|^\ell$, and we have
that $A \equiv 0 \pmod p$ if and only if $p$ is a prime factor of $A$. Now, we observe that
a natural number $n$ can have at most $\log n$ distinct prime factors: each prime factor is
at least $2$, and if if $n$ has $k$ distinct prime factors, then $n \geq 2^k$.
at least $2$, and if $n$ has $k$ distinct prime factors, then $n \geq 2^k$.
Thus, it follows that
$h(\sigma_i \dots \sigma_{i + \ell - 1}) = h(t)$ if and only if $p$ happens to be one
of the at most $\log |A| \leq \leq \Sigma^\ell = \ell \log |\Sigma|$ many prime factors of $A$.
of the at most $\log |A| \leq \Sigma^\ell = \ell \log |\Sigma|$ many prime factors of $A$.
Now, by the prime number theorem, the number of distinct prime numbers between $2$ and
$\ell^2 \log (|\Sigma|\ell)$ is $\Omega(\ell^2 \log |\Sigma|)$. Thus, the probability that a random
prime number between $2$ and
......@@ -332,6 +332,6 @@ find such a number. There are very efficient algorithms testing whether a given
number. Thus, the time for this step is negligible.
\end{proof}
The topic of string search is a fast topic in algorithms, and many other variants and solutions
The topic of string search is a vast topic in algorithms, and many other variants and solutions
for the problem exist. This will be covered in a later class, in particular in the classes
that deal with algorithmic bioinformatics.
......@@ -3,6 +3,8 @@
\chapter{Shortest Paths with Negative Weights}
We now consider the SSSP-problem with negative edge weights.
\paragraph{Currency exchange.}
First, we consider an example where negative edge weights occur naturally:
suppose we have a set of $n$ currencies, $C_1, c_2, \dots, C_n$ (e.g.,
Euro, British Pound, Danish Kroner, Japanese Yen, Turkish Lira, etc.).
......@@ -63,7 +65,159 @@ weights, the weight function $r'$ is well defined. Furthermore, let
$\pi: C_0 = C_{i_0}, C_{i_1}, \dots, C_{i_k} = C_n$ be a transaction sequence
from $C_1$ to $C_n$, and define the modified weight $r'(\pi)$ of $\pi$ as
\[
r'(\pi) = \prod_{j = 1}^{k} r'_{i_{j - 1}i_j}.
r'(\pi) = \prod_{j = 1}^{k} r'(C_{i_{j - 1}}, C_{i_j}).
\]
By definition, we have
\[
r'(\pi) = \prod_{j = 1}^{k} r'(C_{i_{j - 1}}, C_{i_j})
=
\prod_{j = 1}^{k} \frac{1}{r(C_{i_{j - 1}}, C_{i_j})} =
\frac{1}{\prod_{j = 1}^{k} r(C_{i_{j - 1}}, C_{i_j})}
=
\frac{1}{r(\pi)},
\]
and for any two transaction sequences $\pi$, $\pi'$, we have $r'(\pi_1) \leq r'(\pi_2)$
if and only if
$r(\pi_1) \geq r(\pi_2)$.
This means that $\pi*$ is a transaction sequence with \emph{maximum} weight with respect to $r$,
then $\pi*$ is a transaction sequence with \emph{minimum} weight with respect to $r'$.
By changing the weights from $r$ to $r'$, we have changed our original problem into an equivalent
minimization problem. This addresses (i).
To address (ii), we need to go from multiplication to addition. To do that, we again
define a new weight function $r'': E \rightarrow \mathbb{R}$ as
\[
r''(C_i, C_j) = \log r'(C_i, C_j), \quad \text{for all $(C_i, C_j) \in E$}.
\]
Now, we may have negative edge weights, because the logarithm of a number strictly
between $0$ and $1$ is negative. Let
$\pi: C_0 = C_{i_0}, C_{i_1}, \dots, C_{i_k} = C_n$ be a transaction sequence
from $C_1$ to $C_n$, and define the weight $r''(\pi)$ of $\pi$ as
\[
r''(\pi) = \sum_{j = 1}^{k} r''(C_{i_{j - 1}}, C_{i_j}).
\]
With these weights, we have
\[
r''(\pi) =
\sum_{j = 1}^{k} r''(C_{i_{j - 1}}, C_{i_j})
=
\sum_{j = 1}^{k} \log{r'(C_{i_{j - 1}}, C_{i_j})} =
\log \left(\prod_{j = 1}^{k} r'(C_{i_{j - 1}}, C_{i_j})\right)
=
\log{r'(\pi)}.
\]
Since the logarithm is monotone increasing, it follows for any
two transaction sequences $\pi_1$ and $\pi_2$ that
$r''(\pi_1) \leq r''(\pi_2)$ if and only if $r'(p_1) \leq r'(p_2)$.
This means that a shortest path with respect to the weight function $r''$
is also a shortest path with respect to the weight function $r'$. So finding
an additive shortest path for $r''$ is equivalent to finding a multiplicative
shortest path for $r'$. Thus, we have also addressed (ii) and turned our original
problem into a classic shortest path problem with negative edge weights.\footnote{This
general process of transforming an instance of a new problem by stepwise modifications into
an equivalent instance of a known problem is called a \emph{reduction}. We will learn
much more about reductions in \emph{Grundlagen der Theoretischen Informatik}.}
\textbf{Remark}: In the second step, we have seen a general trick that uses
the logarithm in order to turn multiplication into addition, based on the
well-known rule $\log (a \cdot b) = \log a + \log b$. In other words, instead
of multiplying two numbers, we can take their logarithms and then perform an addition.
This trick is used extensively in the field of artificial intelligence. There,
we often need to deal with probabilities, and it happens often that probabilities
need to be multiplied together. The resulting numbers can get very small very quickly,
leading to problems with floating point precision. Thus, we often deal with the logarithms
of the probabilities instead. This has two advantages: (i) we can use addition instead of
multiplication; and (ii) the magnitudes of the numbers involved do not get too small.
\paragraph{Bellman-Ford-Algorithm.}
Now, let us see how to solve the SSSP-algorithm for graphs with negative edges weights.
Let $G = (V, E)$ be a directed graph, and let $s \in V$ be the source node.
Let $\ell: E \rightarrow \mathbb{R}$ be a weight functions that may have
negative edge weights, and suppose that $G$ does not contain any negative cycles.
In this setting, Dijkstra's algorithm does not work. The reason is that
in Dijkstra's algorithm, we require that once a vertex $v$ is removed from
the priority queue, the shortest path to $v$ is known. However, this does not
need to be the case if we have negative edge weights. Now, there may be a shorter
path to $v$ that has a prefix that is longer than $d_G(s, v)$, followed by a negative
part. In this case, the shortest path to $v$ is discovered only after the prefix has
been computed, which may be after $v$ has been removed from the priority queue. In such
a case, it may be that shortest paths that involve $v$ are not computed correctly.
\textbf{TODO: Add example}
To address this, we take a more abstract view of Dijkstra's algorithm, as follows:
for every vertex $v \in V$, we have two attributes: the \emph{tentative distance} $v.\texttt{d}$
and the \emph{tentative predecessor} $v.\texttt{pred}$. Initially, all tentative predecessors
are $\perp$, the tentative distance $s.\texttt{d}$ is $0$, and all other tentative distances
$v.\texttt{d}$ are $\infty$. Our goal is to slowly improve the tentative distances and the predecessors
until we have a shortest path tree for $s$. The main tool for this is a function
\texttt{improve}$(v, w)$, that can be called for any edge $(v, w) \in E$.
The function \texttt{improve} uses the fact that there is an edge from $v$ to
$w$ to improve our current guess for the shortest path to $w$, given our current
guess for the shortest path to $v$:
\begin{verbatim}
improve(v, w):
if v.d + l(v, w) < w.d then
w.d <- v.d + l(v, w)
w.pred <- v
\end{verbatim}
If, according to our current guesses, it is better to first go to $v$ and then follow
the edge $(v, w)$, then we update the attributes for $w$ accordingly. Otherwise,
we do nothing.
We can think of Dijkstra's algorithm as a clever way to coordinate the calls to
\texttt{improve}. We use a priority queue and the fact that the edge weights are nonnegative
to call \texttt{improve} exactly once for every edge $(v, w) \in E$, namely at the time when we can
be sure that a shortest path to $v$ has been found.
If negative edge weights are allows, it is no longer clear how to identify this moment for
an edge $(v, w) \in E$ when the shortest path for $v$ has been found. But the solution is
simple: simply call \texttt{improve} \emph{multiple times} for each edge $e \in E$.
A simple strategy is to just call \texttt{improve} for every edge, and to repeat this until
no more changes take place. The pseudocode is as follows:
\begin{verbatim}
// Initialization, all distances are INFTY,
// all predecessors are NULL
for v in vertices() do
v.d <- INFTY; v.pred <- NULL
// only s has distance 0
s.d <- 0
do
// call improve on every edge
for e = (v, w) in edges() do
improve(v, w)
while at least one call to improve had an effect
\end{verbatim}
Each iteration of the \texttt{do}-\texttt{while}-loop takes
$O(|E|)$ time. To analyze the correctness, and the running time,
we need to understand how many iterations are necessary until the
attributes converge (and to show that they actually correspond to shortest
paths).
\begin{lemma}
Let $v \in V$. At any point in the \texttt{do}-\texttt{while}-loop,
we have $v.\texttt{d} \geq d_G(s, v)$.
\end{lemma}
\begin{proof}
The invariant holds initially, because we have $s.\texttt{d} = 0 = d_G(s, s)$,
and $v.\texttt{d} = \infty \geq d_G(s, v)$, for all $v \in V \setminus \{s \}$.
Now, we show that the invariant is maintained throughout the loop.
For this, suppose that we call the function \texttt{improve}$(v, w)$
on an edge $(v, w) \in E$.
This function call can only change the value of $w.\texttt{d}$.
If it does not change the value, the invariant is maintained,
because the invariant held before the call.
If the call changes the value, we now have
$w.\texttt{d} = v.\texttt{d} + \ell(v, w)$.
Since the invariant holds before the call,
we have $v.\texttt{d} \geq d_G(s, v)$, and hence
$w.\texttt{d} \geq d_G(s, v) + \ell(v, w)$.
Finally, we have
$d_G(s, v) + \ell(v, w) \geq d_G(s, w)$, because the
length of a shortest path from $vs$ to $w$ is not longer
than the length the we get by following a shortest path from $s$ to $v$
and then taking the edge from $v$ to $w$.
It follows that also in this case, we have
$w.\texttt{d} \geq d_G(v, w)$ after the call, and the invariant is maintained.
\end{proof}
No preview for this file type
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment