Monday, April 11, 2016

HackerRank: String Calculate function (III) - Suffix array (II)

Plan to work on LCP array later. 

Use C# implementation of suffix array in blog #1, still time out on hackerRank

SuffixArray implemented in C#, by MSFT, Google employee; Julia has to catch up C#, and learn 
how to write beautiful code like this:

Study C# code how to implement IEnumberable interface in suffixArray code:

how String.Compare api is designed
Julia also takes some time to work on suffix array, and get some drawing about suffix tree, and play with two case:
aaaaaa, 
banana, 
abcabcddd, 
For those two strings, what are suffix tree? Get so close to concrete examples, draw diagram, run test case, study code, and compare to other C# submission. 

Focus on how to find the range of suffix array to fit in the pattern: 


Walk through the examples how suffix array is implemented in C# code:


public SuffixArray Search(string str)

        {
            if (m_lower > m_upper)
                return this;
            // Otherwise search for boundaries that enclose all
            // suffixes that start with supplied string.
            var lower = Search(str, c_lower);
            var upper = Search(str, c_upper);
            return new SuffixArray(m_text, m_pos, lower + 1, upper);
        }

Example 1: string "aaaaaa", pattern string "aaa". 
suffix array {5, 4, 3, 2, 1, 0}
suffix string:
a
aa
aaa
aaaa
aaaaa
aaaaaa

suffix array is acending order, string.Compare("a","aa")  = -1, string.Compare("aa","aa") = 0. string.Compare("aa","a") = 1

suffix array is implemented using interface IEnumberable. 
we try to find 2 index, low, top, 
index = 1, "aa", string.compare("aa", 0,"aaa",0, 3) = -1, 
now, we need to find top index, 
assume that "abc" is in the array, string.compare("abc",0,"aaa",0, 3) = 1, first one is >-1, stop; 
aaa, aaaa, aaaaa, aaaaa, all computed value of comparison = 0 since we only check substring with length 3. 
So top index is 5. 
So, count of substring "aaa" is calculate by top index - low index + 1
top index = 5, 
low index = 1+1
so count = 4. 

Try to figure out how this binary search algorithm is used to calculate low index and top index, called twice, comparison value for low index = 0, for high index is -1. 
In other words, find last one is smaller than "aaa", and first one bigger than "aaa". 

Example 2: "banana", pattern "ban", 
 Read the content in the blog first:
http://www.geeksforgeeks.org/pattern-searching-set-8-suffix-tree-introduction/

And then, 
suffix strings:
banana
anana
nana
ana
na
a

Sort them ascending order:
a
ana
anana
banna
na
nana

pattern string "ban", first to find a string which is last one < "ban", -> "anana", index 2, "ana" < "ban"
and then, find first string which is bigger than "ban", only comparing length 3 substring, 
"banna" = "ban" based on comparison (substring len = 3), so
"na" - index = 4 > "ban"
2+1 = 3, 
top index = 3 
so, ban pattern match is  3-3 +1 = 1



"Work on Examples First" - Practice feels good!

The LCP-LR array helps improve this to O(m+logN)O(m+logN), in the following way


LCP array reading
https://www.hackerrank.com/challenges/pseudo-isomorphic-substrings/topics/lcp-array

https://en.wikipedia.org/wiki/LCP_array

Follow up: May 4, 2015
Read the article:

June 10, 2016 - Plan to spend 30 minutes to read the slides:

Julia, learn something here from the blog:

http://decomplexify.blogspot.ca/2014/07/lcp-array.html?view=classic

No comments:

Post a Comment