Julia's coding blog - Practice makes perfect

From January 2015, she started to practice leetcode questions; she trains herself to stay focus, develops "muscle" memory when she practices those questions one by one. 2015年初, Julia开始参与做Leetcode, 开通自己第一个博客. 刷Leet code的题目, 她看了很多的代码, 每个人那学一点, 也开通Github, 发表自己的代码, 尝试写自己的一些体会. She learns from her favorite sports – tennis, 10,000 serves practice builds up good memory for a great serve. Just keep going. Hard work beats talent when talent fails to work hard.

Monday, February 6, 2017

Hash function, hash table lecture notes study

Feb. 6, 2017

3 lecture notes to study

Spent more than 2 hours to study the lecture notes - Lecture 6 about Rabin-Karp algorithm from MIT, and also lecture 7, resize of hashtable.

Will come back to write down some notes and also share something about the algorithm.

Lecture 5 notes:

Hash functions

Consider a function h(k) that maps the universe U of keys (specific to the hash table, keys could be integers, strings, etc. depending on the hash table) to some index 0 to m.
We call this function a hash function.

A good hash function

. satisfies ( approximately) the assumption of simple uniform hashing: each key is equally likely to hash to any of the m slots. The hash function shouldn't bias towards particular slots

. does not hash similar keys to the same slot (e.g. compiler's symbol table shouldn't hash variables i and j to the same slot since they are used in conjunction a lot)

. is quick to calculate, should have O(1) run time

. is deterministic. h(k) should always return the same value for a given k

Example 1: Division method

prime number vs should ok be a power of 2

h(k) = k mod m

if m = 2 ^p, then the h(k) only looks at the p lower bits of k, completely ignoring the rest of bits in k. A good choice for m with the division method is a prime number ( why are composite numbers bad?).

Example 2: Multiplication method

h(k) = floor(m(k A mod 1))

Collisions

Chaining -

load factor alpha

If there are n keys in a hash table with m slots, we can the load factor alpha for the hash table to be n/m.
Under the assumption of simple uniform hashing, the length of each linked list in the hash table is alpha.

Open addressing collisions

Linear probing

Linear probing resolves collisions by simply checking the next slot, i.e. if a collision occurred in slot j, the next slot to check would be slot j + 1. More formally, linear probing uses the hash function

h(k, i) = (h'(k) + i ) mod m.

Quadratic probing resolves collisions in a similar fashion:

h(k, i) = h'(k) + c₁ i + c₂ i²) mod m

for some constants c₁, c₂. Instead of linearly traversing through the hash table slots in the case of collisions, quadratic probing introduces more spacing between the slots we try in case of a collision, which reduces the clustering effect seen in linear probing.

Double hashing resolves collisions by using another hashing function to determine which slot to try next.

Lecture 6 notes

Rabin-Karp algorithm

probe sequence

Performance of Open Addressing

linear probing
quadratic probing
double hashing

simple uniform hashing assumption (SUHA)
a hash function mapped to any slot from 0 to m -1 with equal probability

uniform hashing assumption (UHA)
a random permutation of the slots 0 to m-1

load balance alpha, p = 1 - alpha probability that the first probe will find an empty slot under UHA.

Universal hashing

Rolling hash

Lecture 7 notes

Hash table

Learn a few keywords:

Julia's coding blog - Practice makes perfect

Monday, February 6, 2017

Hash function, hash table lecture notes study

No comments:

Post a Comment