Wednesday, June 22, 2016

Leetcode 139: word break I - 3+ practices

June 22, 2016

Work on leetcode 139 - word break I

Problem statement:
Given a string s and a dictionary of words dict, determine if s can be segmented into a space-separated sequence of one or more dictionary words.
For example, given
s = "leetcode",
dict = ["leet", "code"].
Return true because "leetcode" can be segmented as "leet code".
Programming skills is like muscle, if you do not maintain it, it will go back to fat. When Julia tries to stretch DP muscle, she found out that she has nothing to stretch. She has to work on it, build up by hours, build up again. Leetcode 139: word break I (medium).
Julia read the code/ solution, but she could not figure out in detail, specially for those two loops - for DP algorithm. 
1 hour, play with 2 test cases. 
3 practices using C#; Set up a test case, run the code, and then, understand the algorithm.

Same code, different test cases
1st practice:
https://gist.github.com/jianminchen/f763ce5033ea4b152c7a56aaf98b7075

2nd practice:
https://gist.github.com/jianminchen/a1bafb4271b8bbdb18b92d0b3dad913e

3rd practice:
https://gist.github.com/jianminchen/bd9301c3ad38d801001d1d66a08256fe

Let us walk through a test case:
string test = "abcd",
and dictionary string array: {"a","bc","d"}

We know that, "a" can be constructed by word "a" in the dictionary.
"ab" cannot be constructed by the dictionary.

Now, what we can do to work on substring "abc", assuming that "a", "ab" are processed already.

How to approach the problem?
Use dynamic programming, reduce the work to minimum.

abc, so 'c' is the new comer, assuming that "a" and "ab" have been processed. By debugging the code, Julia put together the following comment:

so, step 1, "ab" cannot be constructed by the words in dictionary, so it does not matter if 'c'  is in dictionary. 
   Use array bool[] dp to store the cached value; 
  
   Cannot conclude that "abc" can be constructed by words in dictionary, need to go through all possible new words: {"c", "bc", "abc"}

    step 2, backward one more char, new word: "bc", 
 "a" is in cache - dp[1] = true, 
 "bc" is in the dictionary. 
 so,  "abc" can be constructed by words in dictionary. 
 Stop here, break the loop.     

    At most there are 3 words ending at 'c':
    "c",
    "bc",
    "abc",
    All other substrings are the substring of  "ab", assuming that DP is used, so no need to worry about.

4th C# practice:
https://gist.github.com/jianminchen/36de4c46292e79290af95098dfee3cb0

Question and Answer:
1. What is most time consuming part in coding?
Julia spent over 20 minutes to figure out what should be looped on - line 80, it is hard to figure out
meaningful ways to loop. She tried a few of ideas, then, she chose to loop on the position of right string.

   "abc",
   i = 2, then, 3 things are tried:
  1.  "ab", "c"
  2.  "a", "bc"
  3.  "",   "abc"

  two strings are in each case, right string's starting position in the original string "abc"
 line 80    for( int pos = i;  pos >= 0; pos--)
  i = 2, "c"
  i = 1, "bc"
  i = 0,  "abc"

4th C# practice:
 https://gist.github.com/jianminchen/36de4c46292e79290af95098dfee3cb0

Spend 10 minutes to write the program again.
5th C# practice:
https://gist.github.com/jianminchen/9766aea70c9015fabbea98fb71b56d6d

hightlight of changes:
1. line 41 - variable name is changed from "left" to "existingWord", existingWord is already processed, which can be looked up through cache array - true/ false.
2. line 42, newWord, each time the new char is processed, there are ith words ending at position i.
Need to check if it is in word dictionary.
Comparison between 4th practice and 5th practice:
comment:

code:
4th practice:
1. Do not know only work on substring 0-i, length i+1 substring, bottom up approach as DP - dynamic programming.
2. Confuse with string left, right, why left string needs to check cache, and then right string checks word dictionary.

Read blog to review dynamic programming:
https://en.wikipedia.org/wiki/Dynamic_programming

Review a few concepts:
Principle of Optimality
optimal substructure
larger problem - sub-problems
Bellman equation - optimization literature names relationship
optimal substructure and overlapping sub-problems
Divide and Conquer   vs dynamic programming
Top-down approach vs Bottom-up approach

Word break I uses bottom up solution -

6th practice:
https://gist.github.com/jianminchen/29758bad5d56d54e87c2a6f52d057835
comparison line by line:
    6th practice                                             5th practice

7th practice:
second loop on DP algorithm, new word loop starts from start position from 0 to i, in ascending order, line 43. 
https://gist.github.com/jianminchen/2e56665d934f9b9cbc2f94adc8e567bb

Statistics:
Time spent:
2 hour +


Train insane or remain the same - focus on training!

No comments:

Post a Comment