Feb. 6, 2017
3 lecture notes to study
Spent more than 2 hours to study the lecture notes - Lecture 6 about Rabin-Karp algorithm from MIT, and also lecture 7, resize of hashtable.
Will come back to write down some notes and also share something about the algorithm.
Hash functions
We call this function a hash function.
A good hash function
. satisfies ( approximately) the assumption of simple uniform hashing: each key is equally likely to hash to any of the m slots. The hash function shouldn't bias towards particular slots
. does not hash similar keys to the same slot (e.g. compiler's symbol table shouldn't hash variables i and j to the same slot since they are used in conjunction a lot)
. is quick to calculate, should have O(1) run time
. is deterministic. h(k) should always return the same value for a given k
Example 1: Division method
prime number vs should ok be a power of 2
h(k) = k mod m
if m = 2 p, then the h(k) only looks at the p lower bits of k, completely ignoring the rest of bits in k. A good choice for m with the division method is a prime number ( why are composite numbers bad?).
Example 2: Multiplication method
h(k) = floor(m(k A mod 1))
Collisions
Chaining -
load factor alpha
If there are n keys in a hash table with m slots, we can the load factor alpha for the hash table to be n/m.
Under the assumption of simple uniform hashing, the length of each linked list in the hash table is alpha.
Open addressing collisions
Linear probing
Linear probing resolves collisions by simply checking the next slot, i.e. if a collision occurred in slot j, the next slot to check would be slot j + 1. More formally, linear probing uses the hash function
h(k, i) = (h'(k) + i ) mod m.
Quadratic probing resolves collisions in a similar fashion:
h(k, i) = h'(k) + c1 i + c2 i2) mod m
for some constants c1, c2. Instead of linearly traversing through the hash table slots in the case of collisions, quadratic probing introduces more spacing between the slots we try in case of a collision, which reduces the clustering effect seen in linear probing.
Double hashing resolves collisions by using another hashing function to determine which slot to try next.
probe sequence
Performance of Open Addressing
linear probing
quadratic probing
double hashing
simple uniform hashing assumption (SUHA)
a hash function mapped to any slot from 0 to m -1 with equal probability
uniform hashing assumption (UHA)
a random permutation of the slots 0 to m-1
load balance alpha, p = 1 - alpha probability that the first probe will find an empty slot under UHA.
Universal hashing
Rolling hash
Hash table
Learn a few keywords:
Performance of Open Addressing
linear probing
quadratic probing
double hashing
simple uniform hashing assumption (SUHA)
a hash function mapped to any slot from 0 to m -1 with equal probability
uniform hashing assumption (UHA)
a random permutation of the slots 0 to m-1
load balance alpha, p = 1 - alpha probability that the first probe will find an empty slot under UHA.
Universal hashing
No comments:
Post a Comment