Thursday, June 2, 2016

Leetcode 146: LRU cache - a practice makes difference (II)

June 2, 2016


Previous blog:

Warm up the algorithm: (a lot of hurdles to go through)

Questions and answers:

How to design LRU? Data structure? What for?

LRU

One of the most common cache systems is LRU (least recently used). In fact, another common interview question is to discuss data structures and design of an LRU cache. Let’s start with this approach.
The way LRU cache works is quite simple. When the client requests resource A, it happens as follow:
  • If A exists in the cache, we just return immediately.
  • If not and the cache has extra storage slots, we fetch resource A and return to the client. In addition, insert A into the cache.
  • If the cache is full, we kick out the resource that is least recently used and replace it with resource A.
The strategy here is to maximum the chance that the requesting resource exists in the cache. So how can we implement a simple LRU?


LRU design

An LRU cache should support the operations: lookup, insert and delete. Apparently, in order to achieve fast lookup, we need to use hash. By the same token, if we want to make insert/delete fast, something like linked list should come to your mind. Since we need to locate the least recently used item efficiently, we need something in order like queue, stack or sorted array.
To combine all these analyses, we can use queue implemented by a doubly linked list to store all the resources. Also, a hash table with resource identifier as key and address of the corresponding queue node as value is needed.
Here’s how it works. when resource A is requested, we check the hash table to see if A exists in the cache. If exists, we can immediately locate the corresponding queue node and return the resource. If not, we’ll add A into the cache. If there are enough space, we just add a to the end of the queue and update the hash table. Otherwise, we need to delete the least recently used entry. To do that, we can easily remove the head of the queue and the corresponding entry from the hash table.


Eviction policy

When the cache is full, we need to remove existing items for new resources. In fact, deleting the least recently used item is just one of the most common approaches. So are there other ways to do that?
As mentioned above, The strategy is to maximum the chance that the requesting resource exists in the cache. I’ll briefly mention several approaches here:
  • Random Replacement (RR) – As the term suggests, we can just randomly delete an entry.
  • Least frequently used (LFU) – We keep the count of how frequent each item is requested and delete the one least frequently used.
  • W-TinyLFU – I’d also like to talk about this modern eviction policy. In a nutshell, the problem of LFU is that sometimes an item is only used frequently in the past, but LFU will still keep this item for a long while. W-TinyLFU solves this problem by calculating frequency within a time window. It also has various optimizations of storage.
Skip concurrency and distributed cache in this design. 
Next, talk about coding part - data structure and algorithm using her own words:

C# implementation. 

int capacity - specify the size of cache, cache should be with limited size, since resource is limited, specially for high speed access. source code on line 24.

int size - current size of cache - track current size of cache to determine  if eviction is needed or not.
source code on line 24.

Design a double linked list, so ListNode class is defined with two pointers, prev, next
source code from line 10 - line 22.

every entry has key, value. Use int to simplify the coding. Source code, line 12.
line 12   public int key, val; 

Also, we need to add two more variables: dummy head, dummy tail to help maintain the double linked list. source code on line 25.

Also, we need to be able to find the key in O(1) since it is in cache. Extra space is used, maintain a hash map using Dictionary class in C#:
line 27    Dictionary(int key, ListNode)

Let us count how many variables inside the class LRUCache:
6 variables:    - memorize the variable count - 6 - Try to recall. 

private int capacity, size; 
private ListNode dummyHead, dummyTail; 
private Dictionary<int, ListNode> map; 

ListNode class as a node in a double linked list: 
 public int key, value; 
 publie ListNode prev, next; 

4 variables. 

Statistics: 
Time spent: 2 hours +

Research paper:    feel so good to read a research paper after some coding 

Reading blog: (June 30, 2016)

No comments:

Post a Comment