Plan to work on LCP array later.
Use C# implementation of suffix array in blog #1, still time out on hackerRank
SuffixArray implemented in C#, by MSFT, Google employee; Julia has to catch up C#, and learn
how to write beautiful code like this:
Study C# code how to implement IEnumberable interface in suffixArray code:
how String.Compare api is designed
Julia also takes some time to work on suffix array, and get some drawing about suffix tree, and play with two case:
aaaaaa,
banana,
banana,
abcabcddd,
For those two strings, what are suffix tree? Get so close to concrete examples, draw diagram, run test case, study code, and compare to other C# submission.
Focus on how to find the range of suffix array to fit in the pattern:
Walk through the examples how suffix array is implemented in C# code:
public SuffixArray Search(string str)
{
|
if (m_lower > m_upper)
|
return this;
|
// Otherwise search for
boundaries that enclose all
|
// suffixes that start with
supplied string.
|
var lower = Search(str, c_lower);
|
var upper = Search(str, c_upper);
|
return new SuffixArray(m_text, m_pos,
lower + 1, upper);
|
}
|
Example 1: string "aaaaaa", pattern string "aaa".
suffix array {5, 4, 3, 2, 1, 0}
suffix string:
a
aa
aaa
aaaa
aaaaa
aaaaaa
suffix array is acending order, string.Compare("a","aa") = -1, string.Compare("aa","aa") = 0. string.Compare("aa","a") = 1
suffix array is implemented using interface IEnumberable.
we try to find 2 index, low, top,
index = 1, "aa", string.compare("aa", 0,"aaa",0, 3) = -1,
now, we need to find top index,
assume that "abc" is in the array, string.compare("abc",0,"aaa",0, 3) = 1, first one is >-1, stop;
aaa, aaaa, aaaaa, aaaaa, all computed value of comparison = 0 since we only check substring with length 3.
So top index is 5.
So, count of substring "aaa" is calculate by top index - low index + 1
top index = 5,
low index = 1+1
so count = 4.
Try to figure out how this binary search algorithm is used to calculate low index and top index, called twice, comparison value for low index = 0, for high index is -1.
In other words, find last one is smaller than "aaa", and first one bigger than "aaa".
Example 2: "banana", pattern "ban",
Read the content in the blog first:
http://www.geeksforgeeks.org/pattern-searching-set-8-suffix-tree-introduction/And then,
suffix strings:
banana
anana
nana
ana
na
a
Sort them ascending order:
a
ana
anana
banna
na
nana
pattern string "ban", first to find a string which is last one < "ban", -> "anana", index 2, "ana" < "ban"
and then, find first string which is bigger than "ban", only comparing length 3 substring,
"banna" = "ban" based on comparison (substring len = 3), so
"na" - index = 4 > "ban"
2+1 = 3,
top index = 3
so, ban pattern match is 3-3 +1 = 1
"Work on Examples First" - Practice feels good!
The LCP-LR array helps improve this to O(m+logN)O(m+logN), in the following way
LCP array reading
https://www.hackerrank.com/challenges/pseudo-isomorphic-substrings/topics/lcp-array
https://en.wikipedia.org/wiki/LCP_array
Follow up: May 4, 2015
Read the article:
June 10, 2016 - Plan to spend 30 minutes to read the slides:
Julia, learn something here from the blog:
No comments:
Post a Comment