March 5, 2016

Problem
statement:

A gene is
represented as a string of length (where is
divisible by ),
composed of the letters , , , and . It is
considered to be

*steady*if each of the four letters occurs exactly times. For example, and are both steady genes.### Practice

Julia's 1st practice:

Code is here.

Score 15
out of 50, there are 2 run time error, failed a few of test cases.

The
algorithm ends up in time complexity O(n

^{2}), close to brute force solution.### Code study

Work on
two pointers, sliding window, so time complexity is O(n).

Let us go
over two pointers algorithm here using example:

Test case string "GAAATAAA", gene string
"ACTG".

Go over C# program a few lines of statements here:

var a = ReadToken().Select(ch=>"ACTG".IndexOf(ch)).ToArray();

Go over C# program a few lines of statements here:

**Line 40**:var a = ReadToken().Select(ch=>"ACTG".IndexOf(ch)).ToArray();

so the array a =
[3, 0, 0, 0, 2, 0, 0, 0] is representing the sample string "GAAATAAA", since 'G' is the index of 3 in the string "ACTG", 'A' is 0, 'C' is 1, 'T' is 2.

**Line 44 - 45**:

for(int i = 0; i < 4; i++)

target[i] = Math.Max(0, a.Count(aa => aa == i) - n / 4);

So the Target[] = [4, 0, 0, 0]. In other words, 4 of A is needed, but C, T, G do not
matter.

**Line 47 - 51**:

if (target.All(t => t == 0))

{

Write(0);

return;

}

If all
element in Target is 0, then return 0. Learn to use Array.All method.

Left, right two pointer (l, r), starting from 0

Declare a sum array

Start a loop, loop on r

Add
a[r]’s char into sum array,

r ++, r
moves to next one

Inside
the first loop, another loop for r, continue to move forward

if sums do
not reach target 4 of them >= target

The test case "GAAATAAA", find substring: "GAAATA".

Inside the first loop, another loop for left pointer – l

condition:
l < r

If sums
array is bigger than target array, then remove left pointer char count,

then move
left pointer to next one, then the substring is reaching the target,

**"GAAATA" => "AAATA"**

and compare to existing smallest length

"AAATA" is
the smallest one.

This algorithm is much easy to follow than the one in the blog - bear and steady gene.

**Reference:**

**implementation to study, code is here.**

C# code

C# code

## No comments:

## Post a Comment