Skip to content
Snippets Groups Projects
Commit 349f4dc1 authored by Wolfgang Mulzer's avatar Wolfgang Mulzer
Browse files

Fix a notation clash and some typos.

parent ccd6d400
No related branches found
No related tags found
No related merge requests found
......@@ -87,29 +87,30 @@ for different kinds of key sets.
would be ``1100101'' or ``10101`.
To define a hash code for this case, we first pick an arbitrary,
but fixed numbering $| \cdot |$ of the symbols in $\Sigma$ with
but fixed numbering $\| \cdot \|$ of the symbols in $\Sigma$ with
the numbers from $0$ to $|\Sigma| -1$.
For example, for the Latin alphabet, we could take
$|A| = 0, |B| = 1, |C| = 2, \dots, |Z| = 25$, or for the
binary alphabet, we could take $|0| = 0, |1| = 1$. Now,
$\|A\| = 0, \|B\| = 1, \|C\| = 2, \dots, \|Z\| = 25$, or for the
binary alphabet, we could take $\|0\| = 0, \|1\| = 1$. Now,
given a string $\tau$ over an alphabet $\Sigma$, we write
$\tau$ as a sequence as symbols: $\tau = \sigma_0 \sigma_1 \dots \sigma_{\ell - 1}$,
where $\ell$ is the total number of symbols in $\tau$.
For example, if $\tau = \text{HELLO}$,
we have $\ell = 5$ and
$\sigma_0 = H, \sigma_1 = E, \sigma_2 = L, \sigma_3 = L, \sigma_4 = O$.
$\sigma_0 = \text{H}, \sigma_1 = \text{E}, \sigma_2 = \text{L},
\sigma_3 = \text{L}, \sigma_4 = \text{O}$.
Then we define
\[
\text{hc}(\tau) = \sum_{i = 0}^{\ell - 1} |\sigma_i| \cdot |\Sigma|^i.
\text{hc}(\tau) = \sum_{i = 0}^{\ell - 1} \|\sigma_i\| \cdot |\Sigma|^i.
\]
In other words, we interpret $\tau$ as a ``number`` to base $|\Sigma|$, where the
''digits'' are represented by the symbols of $\Sigma$. For example,
\begin{align*}
\text{hc}(HALLO) &= |H| \cdot 26^0 +
|A| \cdot 26^1 +
|L| \cdot 26^2 +
|L| \cdot 26^3 +
|O| \cdot 26^4 \\
\text{hc}(\text{HALLO}) &= \|\text{H}\| \cdot 26^0 +
\|\text{A}\| \cdot 26^1 +
\|\text{L}\| \cdot 26^2 +
\|\text{L}\| \cdot 26^3 +
\|\text{O}\| \cdot 26^4 \\
&= 7 \cdot 1 + 0 \cdot 26 + 11 \cdot 26^2 + 11 \cdot 26^3
+ 14 \cdot 26^4\\
&= 6.598.443.
......
......@@ -237,15 +237,15 @@ we described the following way for computing a hash function $h'$
for a string
$a = \alpha_0 \alpha_1 \dots \alpha_{\ell-1}$: pick a prime number $p$ and interpret the individual
symbols
$\alpha_i$ as numbers between $0$ and $|\Sigma | - 1$. Then, set
$\alpha_i$ as numbers $\| \alpha_i\|$ between $0$ and $|\Sigma | - 1$. Then, set
\[
h'(a) = \left(\sum_{j=0}^{\ell-1} \alpha_j |\Sigma|^{j}\right) \bmod p.
h'(a) = \left(\sum_{j=0}^{\ell-1} \|\alpha_j\||\Sigma|^{j}\right) \bmod p.
\]
For Rabin-Karp, it turns out to be advantageous to define the hash function slightly
differently:
\[
h(\sigma_i\dots \sigma_{i+\ell-1}) =
\left(\sum_{j=0}^{\ell-1} \sigma_{i+j} |\Sigma|^{\ell-1-j}\right) \bmod p.
\left(\sum_{j=0}^{\ell-1} \|\sigma_{i+j}\| |\Sigma|^{\ell-1-j}\right) \bmod p.
\]
There are two main differences between $h$ and $h'$:
\begin{enumerate}
......@@ -260,8 +260,8 @@ Now, the main point is that
Indeed, we have
\[
h(\sigma_{i+1}\dots \sigma_{i+\ell}) =
\left(|\Sigma| \cdot h(\sigma_i\dots \sigma_{i+\ell-1})- |\Sigma|^\ell \cdot \sigma_i +
\sigma_{i+\ell}\right) \bmod p.
\left(|\Sigma| \cdot h(\sigma_i\dots \sigma_{i+\ell-1})- |\Sigma|^\ell \cdot \|\sigma_i\| +
\|\sigma_{i+\ell}\|\right) \bmod p.
\]
(Note that we can precompute $|\Sigma|^\ell$ in advance and can reuse it every time
we update the hash function). We call $h$ a \emph{rolling hash}.
......@@ -327,7 +327,8 @@ It remains to explain how to find a random prime number between
$2$ and $\ell^2 \log (|\Sigma|\ell)$. For this, we simply take a random
number between $2$ and $\ell^2 \log (|\Sigma|\ell)$ and check if it is a prime number.
If the test fails, we repeat. By the prime number theorem, the probability that we find a
prime number of $\Omega(1/$. Thus, we need $O(\log(\ell \Sigma))$ attempts in expectation to
prime number is $\Omega(1/\log(\ell |\Sigma|))$.
Thus, we need $O(\log(\ell |\Sigma|))$ attempts in expectation to
find such a number. There are very efficient algorithms testing whether a given number is a prime
number. Thus, the time for this step is negligible.
\end{proof}
......
No preview for this file type
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment