Saturday, March 5, 2016

HackerRank: Bear and Steady Gene algorithm (III)

March 5, 2016

Problem statement: 

  

A gene is represented as a string of length  (where  is divisible by ), composed of the letters , and . It is considered to be steady if each of the four letters occurs exactly  times. For example,  and are both steady genes.


Practice 


Julia's 1st practice:


Code is here

Score 15 out of 50, there are 2 run time error, failed a few of test cases.

The algorithm ends up in time complexity O(n2), close to brute force solution.

Code study 


C# code implementation to study:

Study code is here


Work on two pointers, sliding window, so time complexity is O(n).

Let us go over two pointers algorithm here using example:

Test case string "GAAATAAA", gene string "ACTG". 
Go over C# program a few lines of statements here:

Line 40
var a = ReadToken().Select(ch=>"ACTG".IndexOf(ch)).ToArray();

so the array a = [3, 0, 0, 0, 2, 0, 0, 0] is representing the sample string "GAAATAAA", since 'G' is the index of 3 in the string "ACTG", 'A' is 0, 'C' is 1, 'T' is 2. 

Line 44 - 45:
for(int i = 0; i < 4; i++)
   target[i] = Math.Max(0, a.Count(aa => aa == i) - n / 4); 

So the Target[] = [4, 0, 0, 0]. In other words, 4 of A is needed, but C, T, G do not matter.

Line 47 - 51:
if (target.All(t => t == 0))
{
      Write(0);
      return; 
}

If all element in Target is 0, then return 0. Learn to use Array.All method. 




Left, right two pointer (l, r), starting from 0

Declare a sum array

Start a loop, loop on r
   
Add a[r]’s char into sum array,

r ++, r moves to next one

Inside the first loop, another loop for r, continue to move forward

if sums do not reach target 4 of them >= target
 
The test case "GAAATAAA", find substring: "GAAATA". 

Inside the first loop, another loop for left pointer – l

condition: l  <  r

If sums array is bigger than target array, then remove left pointer char count,

then move left pointer to next one, then the substring is reaching the target,

"GAAATA" => "AAATA"

and compare to existing smallest length

"AAATA" is the smallest one.

This algorithm is much easy to follow than the one in the blog - bear and steady gene.   

 
Reference: 

C# code
 implementation to study, code is here



No comments:

Post a Comment