Wednesday, December 6, 2017

Leetcode 393: UTF-8 Validation

Dec. 6, 2017


Introduction


The algorithm is to determine if the integer array has specified format with one to 4 byte data. I found out that best way to deal with nervousness is to read the problem statement again and again, more than four times. And then usually I understand the algorithm and bring back my confidence about bit manipulation problem solving. The problem statement is here.

Code review


It is time for me to review my practice of bit manipulation by looking up the blog using keyword: bit manipulation. The search result is here.

Review some basics from one of blogs.

Top coder - fun with bits, the article link is here.

Take some notes:
Use bits of an integer to represent a set. Not only does it produce an order-of-magnitude improvement in both speed and size, it can often simplify code at the same time.

Go over the most popular set manipulations in the following:

Set union         
A | B

Set intersection
A & B

Set subtraction
A & ~B

Set negation
ALL_BITS ^ A

Set bit
A |= 1 << bit

Clear bit
A &= ~(1 << bit)

Test bit
(A & 1 << bit) != 0

Extracting every last bit

Counting out the bits


It is time for Julia to learn bit manipulation again. What Julia likes to do is to read the article word by word and then spend 10 - 20 minutes to go over the terms and warm up the idea how to solve the problem to use integer to represent a set.

It is not so often Julia uses this technique at work, but it is easy to review again and get the idea.

Follow up 


Dec.16, 2017

I asked my mock interview partner to work on this algorithm together, and then we had some discussion. Here are the notes.

The input from the peer:

32 - Each integer has 32 bits

%256 - how to extract rightmost 8 bits?
/256   - move to next 8 bits
>>=8 - or right shift  8 bits

int = int[4] - one integer uses 4 integer

0****** < 128   - check the range
110**** >= 11000000 < 111000000
1110**** >= 11100000 < 11110000
11110*** >- 11110000 < 11111000
10****** >= 10000000 < 11000000

Julia's input:

- integer - bit mask
 set 8 bit - use an integer to represent a set, each integer 8 bits express different numbers

Set bit
A |= 1 << bit

Clear bit
A &= ~(1 << bit)

Test bit
(A & 1 << bit) != 0

Extracting every last bit

Counting out the bits

Apply the bit technique to current case:

test case 1 : (arr[i] & 1 << 7) == 0 - 0xxxxxxx  - figure out the leftmost bit is 0.

test case 2:   arr[i] >> 5 ^ 6  = 0,   judge  110xxxxx 10xxxxxx, there are four operations:
                  AND, OR, negative, XOR ->
The peer came out the idea to shift the first 8 bits to left 5 bits, and then XOR 110, it should be 0.

Reference:

Top coder article about bit manipulation. The link is here.

Editorial notes:


I worked with a peer with strong math and engineering background, the peer works hard; if I share some tips, the peer will come out the solution in less than 5 minutes.

No comments:

Post a Comment