Thursday, April 28, 2016

HackerRank: Bear And Steady Gene - binary search algorithm (VI)

April 28, 2016

Previous blog on binary search on Bear and Gene algorithm:

Come back to the problem on binary search solution, figure out the design:


Problem statement:

https://www.hackerrank.com/contests/hourrank-6/challenges/bear-and-steady-gene  

A gene is represented as a string of length  (where  is divisible by ), composed of the letters , and . It is considered to be steady if each of the four letters occurs exactly  times. For example,  and are both steady genes.

Study code using binary search:

https://gist.github.com/jianminchen/d01faa03ca9b06696db3

https://gist.github.com/jianminchen/395eb9e76fe19cc9338f

Comment:
Binary search is better than linear search using two pointers.

Brute force O(n^2) -> linear search O(n) -> Binary search O(logn) <- Not True!

Java code:
https://gist.github.com/jianminchen/d01faa03ca9b06696db3
C# code
https://gist.github.com/jianminchen/76dffc51880a80279f25

First, go through two test cases: "GTTCCAAA" and "GAAATTCC"

1. "GTTCCAAA",

Binary search is doing this way:
Biggest value of search string length is 8.
First, divide range from 0 to 8 into half, 4
Find first string with length 4,
one is "GTTC", one is "CAAA",
and then, remove count of first half, rest string (two parts, seperated) "CAAA", since A is repeated 3 times, not in the range;
instead of stopping, wrongly conclude that 4 is not high value, go through all the possible substring with length 4, by sliding window of search string: "GTTC" forward from left to right, but keep the same window size
"GTTC" -> "TTCC"->"TCCA"->stop here, since "TCCA" is removed from counting of GENES="ACGT", fit into the requirement.

Next, low=0, high=4, mid = 2,

Because both test cases with one 'A' to replace, Julia figured out through the debugging:

The slide window of fixed length technique,
How to slide?
Which direction to slide?

Kind of clever in design.
Time complexity analysis: length search using binary search, n - length of string, O(log n) times; each search for length m, go over each string once, since using calculated counting array, only do add one/ remove one at a time, one char only does the work once. So, it is O(n) on this.

Total time complexity: O(n logn)
Conclusion: this binary search is not better than linear search is previous blog (IV). 

No comments:

Post a Comment