Problem statement:
https://www.hackerrank.com/contests/hourrank-6/challenges/bear-and-steady-gene
A gene is represented as a string of length (where is divisible by ), composed of the letters , , , and . It is considered to be steady if each of the four letters occurs exactly times. For example, and are both steady genes.
Study code using binary search:
cannot figure out the design - confused!
https://gist.github.com/jianminchen/d01faa03ca9b06696db3
This one is easy to follow
https://gist.github.com/jianminchen/395eb9e76fe19cc9338f
Julia's C# practice code:
https://gist.github.com/jianminchen/af1b3c064c523444463f
Algorithm talk:
Work on test case GAAATAAA, let me explain the idea using binary search.
A1 = Math.Max(0, A1 - n / 4); // test case: GAAATAAA, A1 = 4
C1 = Math.Max(0, C1 - n / 4); // C1 = 0
G1 = Math.Max(0, G1 - n / 4); // G1 = 0
T1 = Math.Max(0, T1 - n / 4); // T1 = 0
if (A1 == 0 && C1 == 0 && G1 == 0 && T1 == 0)
{
return 0; //
}
int ans = n; // default value is maximum value - string length
for (int i = 0; i < n; i++) // go through a loop, start from 0 to n-1.
{
if (A[n] - A[i] < A1 || C[n] - C[i] < C1 || G[n] - G[i] < G1 || T[n] - T[i] < T1) // substring position from i to n, check the count of A, C, G, T
break;
int l = i + 1, r = n; // left, right two pointer
while (l < r)
{
int mid = (l + r - 1) / 2;
if (A[mid] - A[i] < A1 || C[mid] - C[i] < C1 || G[mid] - G[i] < G1 || T[mid] - T[i] < T1) // not enough
l = mid + 1; // set left pointer to mid + 1
else
r = mid;
}
ans = Math.Min(ans, l - i); // substring is from i to l,
}
return ans;
In other words, GAAATAAA,
i = 0, l = 1, r = 8,
find ans = Math.Min(ans, l-i) <- ans = 6, l = 6
i = 1,
find ans = Math.Min(ans, l-i) <- ans = 5, l = 6, i=1
i = 2,
find ans = Math.Min(ans, l-i) <- ans = 5, l = 7, i=2
i =3;
find ans = Math.Min(ans, l-i) <- ans = 5, l = 8, i=3
i=4
break the for loop
if (A[n] - A[i] < A1 || C[n] - C[i] < C1 || G[n] - G[i] < G1 || T[n] - T[i] < T1)
break;
since A[n] - A[i] = 3 < A1 = 4, TAAA, we need the substring at least 4 of A
comment:
This algorithm uses extra space for ACGT count for each i from 0 to n-1, total space is less than 1MB.
int[] A = new int[500500];
int[] C = new int[500500];
int[] G = new int[500500];
int[] T = new int[500500];
|
No comments:
Post a Comment