March 5, 2016
Problem statement:
https://www.hackerrank.com/contests/hourrank-6/challenges/bear-and-steady-gene
A gene is represented as a string of length (where is divisible by ), composed of the letters , , , and . It is considered to be steady if each of the four letters occurs exactly times. For example, and are both steady genes.
Study code using binary search:
https://gist.github.com/jianminchen/d01faa03ca9b06696db3
https://gist.github.com/jianminchen/395eb9e76fe19cc9338f
Comment:
Binary search is better than linear search using two pointers.
Brute force O(n^2) -> linear search O(n) -> Binary search O(logn)
Let us walk through this binary search algorithm using test case GAAATAAA.
int low = 0;
int high = n; // n = 8
int need = 8/4 = 2
int[] cnt = {6, 0, 1, 1} // "ACGT"
mid = 4,
so, copy cnt to tmp[],
and then, first half, take it away, see left half meet the standard:
tmp[i]<=2, i = 0, ..., 3
because first half contains 3 'A', and then, tmp[0] = 3,
deal with second half, go through a loop:
get confused, let us debug through the code.
debug through the code, still confuse!
Java code:
https://gist.github.com/jianminchen/d01faa03ca9b06696db3
C# code
https://gist.github.com/jianminchen/76dffc51880a80279f25
April 28, 2016
Come back to the problem on binary search solution, figure out the design:
First, go through two test cases: "GTTCCAAA" and "GAAATTCC"
1. "GTTCCAAA",
Binary search is doing this way:
Biggest value of search string length is 8.
First, divide range from 0 to 8 into half, 4
Find first string with length 4,
one is "GTTC", one is "CAAA",
and then, remove count of first half, rest string (two parts, seperated) "CAAA", since A is repeated 3 times, not in the range;
instead of stopping, wrongly conclude that 4 is not high value, go through all the possible substring with length 4, by sliding window of search string: "GTTC" forward from left to right, but keep the same window size
"GTTC" -> "TTCC"->"TCCA"->stop here, since "TCCA" is removed from counting of GENES="ACGT", fit into the requirement.
Next, low=0, high=4, mid = 2,
Because both test cases with one 'A' to replace, Julia figured out through the debugging:
The slide window of fixed length technique,
How to slide?
Which direction to slide?
Kind of clever in design.
Time complexity analysis: length search using binary search, n - length of string, O(log n) times; each search for length m, go over each string once, since using calculated counting array, only do add one/ remove one at a time, one char only does the work once. So, it is O(n) on this.
Total time complexity: O(n logn)
Conclusion: this binary search is not better than linear search is previous blog (IV).
From January 2015, she started to practice leetcode questions; she trains herself to stay focus, develops "muscle" memory when she practices those questions one by one. 2015年初, Julia开始参与做Leetcode, 开通自己第一个博客. 刷Leet code的题目, 她看了很多的代码, 每个人那学一点, 也开通Github, 发表自己的代码, 尝试写自己的一些体会. She learns from her favorite sports – tennis, 10,000 serves practice builds up good memory for a great serve. Just keep going. Hard work beats talent when talent fails to work hard.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment