Monday, February 6, 2017

Hash function, hash table lecture notes study

Feb. 6, 2017

3 lecture notes to study 

Spent more than 2 hours to study the lecture notes - Lecture 6 about Rabin-Karp algorithm from MIT, and also lecture 7, resize of hashtable. 

Will come back to write down some notes and also share something about the algorithm.


Hash functions

Consider a function h(k) that maps the universe U of keys (specific to the hash table, keys could be integers, strings, etc. depending on the hash table) to some index 0 to m. 
We call this function a hash function. 

A good hash function

. satisfies ( approximately) the assumption of simple uniform hashing: each key is equally likely to hash to any of the m slots. The hash function shouldn't bias towards particular slots

. does not hash similar keys to the same slot (e.g. compiler's symbol table shouldn't hash variables i and j to the same slot since they are used in conjunction a lot)

. is quick to calculate, should have O(1) run time

. is deterministic. h(k) should always return the same value for a given k

Example 1: Division method 

prime number vs should ok be a power of 2 

h(k) = k mod m 

if m = 2 p, then the h(k) only looks at the p lower bits of k, completely ignoring the rest of bits in k. A good choice for m with the division method is a prime number ( why are composite numbers bad?). 

Example 2: Multiplication method


h(k) = floor(m(k A mod 1))

Collisions

Chaining - 

load factor alpha

If there are n keys in a hash table with m slots, we can the load factor alpha for the hash table to be n/m. 
Under the assumption of simple uniform hashing, the length of each linked list in the hash table is alpha. 

Open addressing collisions 


Linear probing 

Linear probing resolves collisions by simply checking the next slot, i.e. if a collision occurred in slot j, the next slot to check would be slot j + 1. More formally, linear probing uses the hash function 

  h(k, i) = (h'(k) + i ) mod m. 

Quadratic probing resolves collisions in a similar fashion: 

h(k, i) = h'(k) + c1 i + c2 i2) mod m

for some constants c1c2. Instead of linearly traversing through the hash table slots in the case of collisions, quadratic probing introduces more spacing between the slots we try in case of a collision, which reduces the clustering effect seen in linear probing. 

Double hashing resolves collisions by using another hashing function to determine which slot to try next. 



Lecture 6 notes 

Rabin-Karp algorithm

probe sequence  

Performance of Open Addressing

linear probing 
quadratic probing 
double hashing 

simple uniform hashing assumption (SUHA)
a hash function mapped to any slot from 0 to m -1 with equal probability 

uniform hashing assumption (UHA) 
a random permutation of the slots 0 to m-1 

load balance alpha, p = 1 - alpha probability that the first probe will find an empty slot under UHA. 

Universal hashing  

Rolling hash 

Hash table

Learn a few keywords:

probe sequence  

Performance of Open Addressing

linear probing 
quadratic probing 
double hashing 

simple uniform hashing assumption (SUHA)
a hash function mapped to any slot from 0 to m -1 with equal probability 

uniform hashing assumption (UHA) 
a random permutation of the slots 0 to m-1 

load balance alpha, p = 1 - alpha probability that the first probe will find an empty slot under UHA. 

Universal hashing  


No comments:

Post a Comment