Tuesday, November 10, 2020

Leetcode 727 minimum window substring: C# + DP + n subproblems based on subsequence's length

 Nov. 10, 2020

Introduction

It is tough for me to write a C# algorithm using dynamic programming for 727 minimum window substring in less than 20 minutes. My first practice took me over 30 minutes. I like to work on more carefully and write down some good ideas how to solve this algorithm. 

Case study

Case study 1: string "sbsbc", pattern string "sbc"

First of all, the substring "sbsbc" contains subsequence "sbc" which keeps the order as well. But length is 5, shortest one should be "sbc". So in order to get minimum substring, any substring after the first one "sb" should be recorded as well. 

Based on the analysis, it is to work on dp[length + 1], for each i from 1 to length, the largest index containing subsequence "sbc" should be saved. There are length + 1 subproblems. 

In other words, pattern string "sbc", dp[1] is string "sbsbc"'s substring containing subsequence "s"'s start index; largest one should be saved since it can be next shortest length one.
dp[2] is string "sbsbc"'s substring containing subsequence "sb"'s start index; larger index value should replace smaller one. dp[2] starts from value 0, when "sbsbc" is iterated at index = 3, dp[2] is updated with value 2.

Let me walk through "sbsbc", 

first, iterate on first char 's', so dp[1] = 0, length =1 substring starts from index = 0. 

next, iterate on second char 'b', 'b' is second char in pattern string, dp[2] = dp[1] somce dp[1] > 0, so dp[2] = 1, and reset dp[1] = 0. 

thirdly, iterate on third char 's', s is the first char in pattern string, reset dp[1] = 2;

fourthly, iterate on fourth char 'b', b is the second char in pattern string, reset dp[2] = 2, and dp[1] = 0; 

last one, iterate on fifth char 'c', c is the third char in pattern string, set dp[3] = dp[2] since dp[2] > 0, dp[3] = 2. 

Return should be 5 - dp[3] = 5 - 2 = 3. 

Highlights of C# practice

  1. It is important to save pattern chars in count = HashSet<int>[26], since pattern string may have duplicate chars;
  2. More on step 1, pattern string "sbbc", count variable count[1] includes {1, 2}. It is not sorted, not allow duplicate, which is fine with index of position in the pattern string. Convert to list and then sort them. 
  3. Work on count[i] list in decreasing order. 
  4. More on step 1, it is better to use C# HashSet instead of C# SortedSet, no need to maintain sorted order until all are in HashSet first. 
  5. Work on a few more test cases. "sbsbbc", pattern string "sbbc"; one more test case: "sbsbabc", pattern string "sbbc". 

How to design a dp function?

The order of pattern string should be maintained. So dp[i] is designed with variable i being the index of pattern string. 

The string's start position is saved for pattern substring subsequence, which can be used to calculate the length of substring. That is dp[i]'s value. 

Go through each char in string, update matching positions in pattern string's dp values. Cover all subproblems's update. 

Once dp[i]'s value update on last position in pattern string, save the value since one subsequence is found. 

The code has bugs, and I will fix it. 

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace _727_minimum_window_subsequence
{
    class Program
    {
        static void Main(string[] args)
        {
            var result = FindMinimumWindowSubSequence("sbsbc", "sbc");
            Debug.Assert(result == 3); // "sbc" is found 

            var result2 = FindMinimumWindowSubSequence("sbsbbc", "sbbc");
            Debug.Assert(result2 == 4); // "sbc" is found 

            var result3 = FindMinimumWindowSubSequence("sbsbabc", "sbbc");
            Debug.Assert(result3 == 5); // "sbc" is found 

            var result4 = FindMinimumWindowSubSequence("sbacsbcsbbc", "sbc");
            Debug.Assert(result4 == 3); // "sbc" is found 
        }

        /// <summary>
        /// Nov. 10, 2020
        /// work on "sbsbc", and pattern string "sbc" first
        /// The idea is to define dp array to store start index. 
        /// dp[i] represent p.Substring(0, i) start index (keep updating with biggest one).
        /// For example, pattern string "sbc", dp[1] is to store 's''s information. 
        /// The challenge is to store all subproblems in dp array. 
        /// For example, string s = "sbsbc", when second char "s" shows up in string s, it should be
        /// recorded since it may start another "sbc" with shorter length. 
        /// My idea is to store all chars in pattern string in HashSet<int>[26], and then "b" is visited,
        /// all 'b''s positions in pattern string is traversed by decreasing order, update dp if need.
        /// 'b''s positions in "sbbc" are {1, 2}
        /// </summary>
        /// <param name="s"></param>
        /// <param name="p"></param>
        /// <returns></returns>
        public static int FindMinimumWindowSubSequence(string s, string p)
        {
            if (s == null || p == null || s.Length < p.Length)
            {
                return -1;
            }

            var pLength = p.Length;
            var dp = new int[pLength + 1];
            var found = new List<int>(); 

            for (int i = 0; i < pLength + 1; i++)
            {
                dp[i] = -1;
            }

            var count = new HashSet<int>[26];
            for (int i = 0; i < 26; i++)
            {
                count[i] = new HashSet<int>();
            }

            for (int i = 0; i < pLength; i++)
            {
                count[p[i] - 'a'].Add(i);
            }

            for (int i = 0; i < s.Length; i++)
            {
                var current = s[i];
                var index = current - 'a';

                var list = count[index].ToList();
                list.Sort();

                for (int j = 0; j < list.Count; j++)
                {
                    // reverse order
                    var next = list[list.Count - j - 1];
                    if (next == 0)
                    {
                        dp[1] = i;
                    }
                    else if (dp[next] >= 0)
                    {
                        dp[next + 1] = dp[next];
                        dp[next] = 0;
                    }

                    // work on next + 1, not next
                    if(next + 1 == pLength && dp[next + 1] >=0)
                    {
                        found.Add(i - dp[next + 1] + 1);
                    }
                }
            }

            return found.Count == 0? -1 : found.Min();
        }
    }
}

No comments:

Post a Comment