Friday, July 29, 2016

K largest elements in the array - various ideas

July 29, 2016

Top k values in the array - review the following article: 

http://www.geeksforgeeks.org/k-largestor-smallest-elements-in-an-array/

1. sorting the array   O(nlogn)
2. use selection sort O(nk)
3. use sorting
4. use min heap
5. use max heap
6. use temporary array
7. use order statistics

7A. randomized selection algorithm - a dertministic algorithm that runs in O(n) in the worst case

 http://www.cse.ust.hk/~dekai/271/notes/L05/L05.pdf
1. the idea is to divide n items into n/5 sets (denoting m sets), each contains 5 items. O(n)
2. Find the median of each of the m sets. O(n)
3. Take those m medians and put them in another array.  Use Dselection() to recurisively calculate the median of these medians. Call this x. T(n/5)
4. ...

7B. Use QuickSort partition algorithm to partition around the kth largest number O(n).

7C. Sort the k-1 elements (elements greater than the kth largest element) O(klogk). This step is needed only if sorted output is required.

Summary of study for Leetcode 347 - K most frequent element in the array

July 29, 2016



LINQ:

Simplify your programs with LINQ - Language Integrate Query (LINQ) study series IV

July 29, 2016

 LINQ - blogs to read:

Notes:

1. Initialize an array
int[] a = Enumerable.Repeat(-1,10).ToArray(); 
int[] b = Enumerable.Range(0,10).ToArray(); 
int[] c = Enumberable.Range(0,10).Select(i=>100+10*i).ToArray(); 

Enumerable.Repeat, Range, 
no for loops necessary, using LINQ 

2. Iterate over multiple arrays in a single loop

foreach(var x in array1)
  DoSomthing(x); 

foreach(var x in array2)
  DoSomething(x); 

LINQ
foreach(var x in array1.Concat(array2))
  DoSomething(x); 

LINQ operates at the enumerator level (?), it will not allocate a new array to hold elements of array1 and array2. So, space-efficient. 

3. Generate a random sequence

Random rand = new Random(); 
var randomSeq = Enumerable.Repeat(0,N).Select(i=>rand.Next()); 

lazy nature of LINQ, the sequence is not pre-computed and stored in an array, but instead random numbers are generated on-demand, as you 

iterate over randomSeq. 


4. Generate a string 

Generate a strign with the repeating pattern "ABCABCABC..." of length N. 

string str = new string(Enumerable.Range(0,N).Select(i => (char)('A' + i%3)).ToArray()); 

string values = string.Join(string.Empty, Enumerable.Repeat(pattern, N).ToArray()); 

5. Convert sequences or collections 


IEnumerable<string> strEnumerable = ...;
IEnumerable<object> objEnumerable = strEnumerable.Cast<object>(); 

List<string> strList = ...;
List<object> objList =  new List<object>(strLsit.Cast<object>()); 

var objList = strList.Cast<object>().ToList(); 

6. Convert a value to a sequence of length 1

IEnumerable<int> seq = Enumerable.Repeat(myValue, 1); 

7. Iterate over all subsets of a sequence

For small input, three problems:

subset sum
boolean satisfiability
knapsack problem

NP-complete problems - 


solve easily by iterating over all subsets of some sequence:

Generics in .NET framework - Language Integrate Query (LINQ) study series (III)

July 29, 2016

Generics - article reading:

Notes:

Advantages:
code reusability and type safety 
Better performance - no need to box the value types
type-safe callbacks
lightweight dynamic methods vs entire assemblies 


compiler will help to do type safety - enforce at compile time 
The need for type casting and the possibility of run-time errors are reduced. 

Generics streamline dynamically generated code. 


placeholder (type parameters) for one or more of the type that they store or use. 

.NET collections - Language Integrate Query - LINQ study series (II)

July 29, 2016

More reading about LINQ:
Google search keywords:
1. basic code patterns used for LINQ
2. .NET collections

Notes from the above article:

LINQ queries 
in-memory objects
object type 
IEnumerable
IEnumerable<T>
object type implements IEnumberable or IEnumerable<T>

LINQ common pattern for accessing data vs standard foreach loops
more concise and readable 

LINQ 3 capabilities: 
filtering
ordering
grouping capabilities

LINQ 
improve performance

common variations of data collections:
hash tables, 
queues, 
stacks, 
bags   ? 
dictionaries
lists

3 Interfaces:
ICollection
IList
IDictionary
generic conterparts

ICollection -> IList
ICollection -> IDictionary

3 examples based on IList
Array
ArrayList
List<T>

4 examples based on ICollection - 1 value
Queue, 
ConcurrentQueue<T>
Stack
ConcurrentStack<T>
LinkedList<T>

5 examples based on IDictionary - 2 values, both a key and a value
Hashtable
SortedList
Dictionary<TKey,TValue>
SoretdList<TKey, TValue> 
ConcurrentDictionary<TKey, Tvalue> 

Special one - a list of values with keys embedded within the values and, therefore, it behaves like a list and like a dictionary
KeyedCollection<TKey,TItem> 

class vs generic classes ? 

Generic collections
strong typing
Ready for this - memorize the claim: Julia likes it - reading is quickest way to become an expert!
Generic collections are the best solution to strong typing

-- Julia likes strong typing, find problems in compile time, fix bugs in static analysis, no debugging/test cases' help
-- Design the function to less error prone - as simple as possible
-- do one task only

Facts? Look into google: 
some languages does not support generics - which one, not C#

behaviors:
how they are sorted
how searches are performed
how comparisons are made


3+ tips - how to read technical articles quickly

July 29, 2016 

Small research  - how to save time and more efficient to read technical articles?

 Read technical articles - how to save time and more efficient?

Any order in the following:
1. Write down some notes:
keywords
things to remember
things not so clear
things to help memorize

2. Write down the ideas in the article -
short,
concise,
keywords only

And also,

sort by alphabetical order                                                                           (rate 10 out of 10)
Separate Facts/ Arguments/ Rules/ Tips                                                  (rate 9 out of 10)
short chars to iterate (ex: S.O.L.I.D. Principle, easy to  remember)     (rate 8 out of 10)

Search more on google...

3. Get organized, maybe, better/ clear/ more pleasant ways


Introduction to LINQ Queries (C#) - Language Integrate Query (LINQ) study series (I)

July 28, 2016

LINQ:


Study more on LINQ: 
1. LINQ - introduction

Languages for different data source: 

SQL    - relational database 
XQuery - XML

LINQ - 
a consistent model 
basic coding patterns
For example:  ? patterns - 

Five most popular data source:  (5 sources: A, N, S, X, any)

ADO.NET Datasets
.NET collections
SQL databases 
XML documents
any other format

2. More reading about LINQ:
Google search keywords:
1. basic code patterns used for LINQ
2. .NET collections

3.Generics in .NET framework
4. LINQ - blogs to read:

5.


Leetcode 347 - Find k most frequent numbers in the array - C++/ Java Solutions

July 28, 2016

Find k most frequent numbers in the array
Problem:
  // nums = [5, 3, 1, 1, 1, 3, 73, 1]
  // k = 1
  // return [1]

  // k = 2
  // return [1, 3]

  // k = 3

  // return [1, 3, 5]

Code study:    nothing can compare to code reading 
C++ code to study: 
use unordered_map and priority_queue
1. https://gist.github.com/jianminchen/cd9f536708f6ec42ae229d853c881361

2. use bucket sort - 
https://gist.github.com/jianminchen/c3d49c11c090b91c94ae7b05bdc11786

https://gist.github.com/jianminchen/88f3e2d441c62ecb7d9d706d1b9bda37

3. use Map + std::sort
https://gist.github.com/jianminchen/3e5bba61847c5c84d95e1ca7806c81be

Java code to study:

1.use a min-heap - Java PriorityQueue class - underneath heap sort 
https://gist.github.com/jianminchen/76b2b3196a4e58312e1ba1d0c288d941

2. use heap:
https://gist.github.com/jianminchen/2482dbacb59e464d94c4a4d16cbc6d3b

3. use bucket sort:
https://gist.github.com/jianminchen/0626a0cb673526d4b7b58698004b87b8

4. use Collections.sort - underneath merge sort, 
https://gist.github.com/jianminchen/3d8fab01965efd19a0f72b2109501c8d

5. use quicksort partition

https://gist.github.com/jianminchen/99d2dea800b28e24456827946409d32d

Conclusion: Based on the analysis, the time complexity should be better than O(nlogn), if using heap sort or merge sort, when k is big enough close to n, then, O(nlogk) is close to O(nlogn). 

However, the bucket sort is always O(n), so the time complexity is O(n), no matter k's size. 


Thursday, July 28, 2016

Leetcode 347: Find k most frequent numbers in the array - C# solution

July 28, 2016

Find k most frequent numbers in the array
Problem:
  // nums = [5, 3, 1, 1, 1, 3, 73, 1]
  // k = 1
  // return [1]

  // k = 2
  // return [1, 3]

  // k = 3

  // return [1, 3, 5]

 C# solution using Dictionary<int,int> and language Integrate Query (LINQ):

First, go through the array once, and keep the count for every distinct value:

We can use Language Integrated Query (LINQ) to call order by and then get Top k values.

Write a solution using C# first, post code here.
1. First practice, using LINQ, OrderByDescending, Take

LINQ reference:

Actionable Item:


Write one solution using bucket sort to get k most frequent numbers in the array. 


Binary Tree Path Sum - two with same value checking

July 28, 2016

Problem:


Write a function that given a tree, returns true if at least 2 paths down the tree have the same sum. A path is a series of nodes from the root to the leaf. 


// ex1:
//     2
//   / \
//   4   5
//  / 
// 1
// returns true // 2+4+1 = 2+5
// ex2:
//     3
//   /
//   4
//  / \
// 1   1
// returns true //3+4+1 = 3+4+1
// ex3:
//   1
//  / \
// 3   4
// returns false // 1+3 != 1+4

Write code for binary tree path sum - two with same value checking:


Julia's first writing using C# language: Code is here

checklist of code style and design issues:

C# practice is here

2.1. if statement is minimum - avoid using if statement.
 Tips: let it fall through base case.
2.2. code style:
   Readable code
   Clean code
2.3. use LINQ to code select clause like SQL statement
2.4. Avoid early return when the duplicate path sum is added. (line 147 - line 154)
2.5. Let main function to take care of discussion of two path sum with same value  (line 123 - 128)
2.6  It is part of the design, one path sum is added to the dictionary twice; so if there are two path sum with same value, the value of dictionary >=4 instead >=2. 

Questions and answers:
1. Which blogs helps you to shape the idea to solve the problem quickly? 
When Julia worked on lowest common ancestor, she used this blog to write a version as well. 

Lowest common ancestor binary tree on geeksforgeek.org, link is here

One of Julia's practice documented in the bog is here

2. What is most important lessons learned through the practice?

Julia tried to avoid if statement when she designed the solution. Let base case take care of if statement only. Make the code simple as possible, therefore, design of the recursive function is void. line 143 - 162, function pathSumTracking.



Follow up



June 14, 2017
Write C# code to improve a few things. C# practice code is here.

Work on Leetcode 112 - Path sum.





Tuesday, July 26, 2016

Lintcode 79: longest common substring

July 26 2016

Work on lintcode: longest common substring.

Problem statement:

Given two strings, find the longest common substring.
Return the length of it.
Note
The characters in substring should occur continiously in original string. This is different with subsequnce.

Algorithm Study

Study the blog - longest common substring (60 minutes reading first time/ 20 minutes review every 6 months) written by a facebook engineer, Ider Zheng. Julia likes the article written in Chinese, it is a well-written and very good thinking process about the problem solving.

Julia likes to repeat the process here in her own words.

1. Brute force solution -> from O(n4) to O(n3), great analysis in the above blog:

Good thinking addes value to your coding practice:

Warmup with a brute force solution:

Time complexity - O(n4)

For example, a string s1 = "abcdefg",

One way to think about brute force:

The length of string is 7.

How many substring in s1? Guess?

Substring - start position and end position, two variables. Each one has O( n ) choices. Total is O(n2) choices.

More detail, start position can be any i from 0 to 6, and then end position starts from i to n-1. The variation formula, Sum = (n - 1) + ( n - 2) + ... + 1 = (n-1) n / 2, so the total is O(n2);

Try to reduce brute force variation from O(n2) to O(n), instead of letting start position and end position
both varies, just work on start position only.

Small improvement based on brute force solution

Time complexity - O(n3)

Second way to think about using brute force:
The start position of substring is from 0 to n-1, so considering the start position, start a new search.

So, the total of search is O(n).

 Longest common substring - start from start-position, and then  compare both of chars are equal, if yes, continue, record length and compare to maximum length, else then break the search.

1. C++ code:

Code is from the blog written by a facebook engineer.

The time complexity is O( n). There is duplicated calculation.

Will write C# practice very soon.

Dynamic programming - optimal time complexity O(n2)


2. Use Dynamic programming, in the above blog, the analysis is very helpful.

Work on dynamic programming, improve time complexity from O(n4) or O(n3) to O(n2), using memorization, space O(n2), bottom up approach.

The idea is to find the formula of DP - dynamic programming.

table T(i, j) - common substrings, using end position as a variable.


one is in s1, ending at position s1[i];
one is in s2, ending at position s2[j].

We know that if T(i, j) >0, then, s1[i] = s2[j];
if(s1[i+1] == s2[j+1]), then, T[i+1, j+1] = T[i,j]+1,
otherwise, T[i+1,j+1] = 0.

Will think about to put together a graph here to explain the idea as well.

2. C++ code:


From blog: C++ code
DP solution, time complexity O(nm), space O(nm)


3. further improvement: C++ code
DP solution, time complexity O(nm), space O(nm) -> O(n+m) -> O(1)
because the recurrence formula tells us that the current position only relies on diagonal position - left-up corner ( i - 1, j - 1)
https://gist.github.com/jianminchen/0061dcf562bd0bdb091301241c38730f

From blog

Julia's practice:
1. brute force solution C#
A. first writing, static analysis catching 1 bug, left 2 bugs for debugging. (not so good!)

B. fix all bugs - final presentation: C#

2. Dynamic programming solution using C#:


Highlights of code writing and execution:

1. static analysis - find bugs
change made: line 56 - 61, if the longest common substring is with length 1 and also start from row 0 or col 0.

line 67 - 72

2. Test case failed on line longestCommonSubstring("abc1def","1ghijkl") - should be "1",

change made: move end1 variable to line 49, and set variable from line 56 - 61, line 67 - line 72

2nd version: add comments


3. DP solution with space reduction: O( nm ) -> O( n+m )  -> O( 1 )

practice later.

* Design issues:
         *
         * 4 variables - memo, longest, end1, searchFirstRowCol
         * 1. memorization using two dimension array - memo
         * 2. variable int longest - get maximum length
         * 3. variable int end1    - string s1 - end position - s1's substring end position
         * 4. variable bool searchFirstRowCol - check first row and first col to update maximum length

DP problems:

Follow up after 8 months


March 17, 2017

1. Read code review on longest common substring algorithm.
2. Read wiki article about "Algorithm Implementation/Strings/Longest common substring"
3. Blog formatting to make it more readable. 
4. Code review:

Ashton and String Hackerrank


Coding practice is like sports - I don't feel fear when I am on court. That's where I feel at home.

Monday, July 25, 2016

Factorial - facebook code lab

July 25, 2016

Given an integer n, return the number of trailing zeroes in n!.
Note: Your solution should be in logarithmic time complexity.
Example :
n = 5
n! = 120 
Number of trailing zeros = 1
So, return 1
Plan to work on the problem.

Blogs to read:
1. Find maximum length snake sequence:

http://www.geeksforgeeks.org/find-maximum-length-snake-sequence/

2. Root to leaf path sum equal to a given number
http://www.geeksforgeeks.org/root-to-leaf-path-sum-equal-to-a-given-number/

Blog to read:
1. How to construct a blog in 10 minutes? Using github and jekyII?

http://cenalulu.github.io/jekyll/how-to-build-a-blog-using-jekyll-markdown/

2. Buy a domain, buy a Alibaba cloud virtual computer, Nginx + WordPress - set up youownsite.com
http://storypku.com/2014/05/%E5%88%9B%E7%AB%99%E8%AE%B0/