Monday, January 18, 2021

Leetcode discuss: 1081. Smallest Subsequence of Distinct Characters

 Here is the link. 

First practice - C# - Deque - Stumble over 7 times

Jan. 15, 2020
Introduction
It is the algorithm of Microsoft mock onsite interview. I could not come out working solution using linear time complexity. After careful thinking, I decide to count every char from right to left, so that I started a journey to find a working solution. It took me over a few hours.

Case study 1
"cbacdcbc", the answer is "adcb". How to find it?
cbacdcbc
cbacdcb1 <- last char is c, so distinct char total count is 1
cbacdc21 <- next to last one is b, so two distinct
cbacd221 <- third to last one is c, still two distinct
cbac3221 <- 4th to last one is d, three distinct

To find smallest subsequence, it is important to maintain deque data structure.
A deque is designed to maintain ascending order in the deque.
I failed so many test cases and then I decided to use C# LinkedList instead of Stack.

Case study 2
cbacdbbc
After counting distinct char from right to left, it is easy to work with the following:
cbacdcbc
44433221
c-> deque
'b' < 'c', remove last of deque, put 'b' into deque.
'a' < 'b', remove last of deque, put 'a' into deque
'a' has none left, so 'a' has to be selected to add into subsequence.

Case study 3
accaccbbcc
3333222211
The challenge of the test case is to understand i = 3, 'a', last 'a', so 'a' has to be added to subsequence, deque has two chars "ac", but 'a' is added to subsequence, stop. 'c' in deque is saved for competing for second char in subsequence.

Advice
Be patient. Once I come out the counting from back to front, the algorithm can be solved using linear time.

The idea is working, but I introduced a few factors to make it super hard to write and make it working. Just leave subsequence in the deque structure, and keep removing or adding from last of deque.

My last practice in Jan. 2019 is here.

Simplify
It is challenge for me to figure out a few steps in order to simplify the solution using one hashset, one stack, without a list to store one by one in subsequence, instead using stack. Those simplification makes the coding working less than 20 minutes.

Value
I tried to figure out what is benefit to make this solution working even if I had to take more than a few hours to figure out so many failed test cases, fix one by one.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace _1081_smallest_subsequence
{
    class Program
    {
        static void Main(string[] args)
        {
            var result = SmallestSubsequence("cbaacabcaaccaacababa");
        }        

        public static void RunTestcases()
        {
            //"cbacdcbc"
            //var result = SmallestSubsequence("cbacdcbc"); 
            //var result = SmallestSubsequence("leetcode");

            // "cbaacabcaaccaacababa" - should be abc, not acb
            // "cdadabcc", should be adbc, not adbcc
            //var result = SmallestSubsequence("cdadabcc"); 
            //var result = SmallestSubsequence("leetcode");
            var result = SmallestSubsequence("bcbcbcababa");
        }

        /// <summary>
        /// the important is to design linear time complexity
        /// </summary>
        /// <param name="text"></param>
        /// <returns></returns>
        public static string SmallestSubsequence(string text)
        {
            if (text == null || text.Length == 0)
            {
                return "";
            }
            
            // from back to front, count all chars 
            var distinctChars = new HashSet<char>(text.ToCharArray());

            var map = new Dictionary<int, HashSet<char>>();
            var digits = new int[26];

            var length     = text.Length;
            var digitCount = new int[26];
            
            var set = new HashSet<char>();
            var targetSize = distinctChars.Count;
            
            var list = new List<char>(); 

            // time complexity: O(26 * N) - optimal solution 
            for (int i = length - 1; i >= 0; i--)
            {
                var current = text[i];
                digitCount[current - 'a']++;

                set.Add(current);               
                map.Add(i, new HashSet<char>(set));
            }            

            int index = 0;            
            
            var deque = new LinkedList<int>(); 

            for (int i = 0; i < length; i++)
            {
                var target = targetSize - index;
                var value = map[i].Count;

                digitCount[text[i] - 'a']--;
                
                var c = text[i] - 'a';
                var found = new HashSet<char>(list);

                if (found.Contains(text[i]))
                {
                    continue;
                }

                var stackSet = new HashSet<int>(deque);
                while (!stackSet.Contains(c) && deque.Count > 0 && deque.Last() >= c)  // avoid duplicate, > should be >= "cdadabcc" - not "adbcc", "adbc"
                {
                    var removed = deque.Last();
                    deque.RemoveLast(); 

                    stackSet.Remove(removed); 
                }

                var existing = new HashSet<int>(deque);
                if (!existing.Contains(c))
                {
                    deque.AddLast(c);
                }

                // need to add all chars in stack to the list                   
                var lastOne = digitCount[text[i] - 'a'] == 0;
                if (i != length - 1)
                {
                    found.UnionWith(map[i + 1]);
                }

                if (lastOne || (i == length - 1 || found.Count < targetSize))
                {
                    int removed = 0; 
                    while(true)
                    {
                        var first = deque.First();
                        deque.RemoveFirst();
                        list.Add((char)(first + 'a'));

                        removed++;
                        if (first == text[i] - 'a')
                        {
                            break;
                        }
                    }                                                              
                       
                    index += removed;                                       
                }                                              
            }

            return new string(list.ToArray()); 
        }
    }
}

No comments:

Post a Comment