Monday, September 5, 2016

Suffix Array and longest common prefix array (LCP array) - study

June 5, 2016

 It is the labor day long weekend, spent 3 hours (6:30am - 10:00am) to read suffix array from this favorite competitive programming book, and try to please herself, a new goal - score any point with suffix array work or LCP (even cannot remember the full name - called longest common prefix array), HackerRank practice or code sprint.

It is a lonely journey - reading the book, but it is perfect for physical recovery  - muscle and bones -
laying on the bed with the excellent one night sleep, just after 3+ hour tennis sports, and do not want to move, read a book.

Yesterday, Julia warmed up more than 1+ hour, one single match, one double match lasted more than one hour, until tie break. She lost double with 5 to 7 lost the match.  She suffered tennis elbow pain issues.

The competitive book about suffix array:

6.4 Suffix Tree and Suffix Array - page 114 - 119


Motivation talk
1. suffix array - who did the research to introduce the term - suffix array in 1993?

 https://en.wikipedia.org/wiki/LCP_array
 https://en.wikipedia.org/wiki/Udi_Manber

More reading:
Suffix array:
http://algs4.cs.princeton.edu/63suffix/

Play with some code first, get solid understanding suffix array - what are the benefits using suffix array? Shorten time complexity - 

suffix tree -> suffix array -> sorted array -> LCP - longest common prefix

Arguments:
1. Building efficient Suffix Tree under contest environment is a bit complex and risky
2. Suffix Array invented by Udi Manber and Gene Myers, has similar functionalities as Suffix Tree but simpler to implement, especially in programming contest setting
3. ...

Facts:
1. Suffix Array is an integer array that contains indices of sorted suffixes


1. Write C# version of this Java code:
http://algs4.cs.princeton.edu/63suffix/SuffixArray.java.html

2. Suffix array - longest repeated substring - using suffix array
http://algs4.cs.princeton.edu/63suffix/LongestRepeatedSubstring.java.html

3. Keyword in context (KWIC)
Given the suffix array, easy to search for a string or sentence via binary search. Memory is linear. Search is O(K log N) where K is the length of the string you are searching for. (Can be done in K + log N by using the lcp array.)

study the code:

More reading:
1. https://leetcode.com/articles/longest-common-prefix/

2. http://www.geeksforgeeks.org/longest-common-prefix-set-1-word-by-word-matching/
3. http://www.geeksforgeeks.org/longest-common-prefix-set-2-character-by-character-matching/
4. http://www.geeksforgeeks.org/longest-common-prefix-set-3-divide-and-conquer/
5. http://www.geeksforgeeks.org/longest-common-prefix-set-4-binary-search/

- Julia likes to calm down quickly when she gets nervous. When she has a negative self-talk, she will remind herself - "Everyone faces challenges on court and I'm no different." Replace with positive self-talk.

No comments:

Post a Comment