Sunday, March 27, 2022

Ming Dao school: Mock interviews

 Ming Dao School conducts system design mock interviews online every Sunday. You are welcome to join us.

Here is the link. 

March 27, 2022

Mock interview - Senior engineer -  system design 

Requirements


Functional requirements

Inventory management like instacart

Make sure we don’t overshop

Shipment coming in

Browse, add to cart, at the same time update availability, no over selling


Out of scope

No payment, shipment


  1. Inventory service, we already have item in

  2. Neighboring team - catalog team. Already handle catalog

  3. Manage inventory and basket


0. Incoming shipment to add to inventory

1. User can browse, search. Search and browse for bare minimum

2. Add to cart, keep something (lock/reserve)

(Really want to focus on inventory management)


Non functional requirements

Availability

High throughput

Service reliability


System Design


10:00


External APIs


API:

  1. add_product(product, number)

  2. Add_to_cart(user, product, number)

  3. checkout(user, product, number)


11:00

Database design


User_purchse_table

User_id, product_id, number, status

Status: in_cart, bought


Product_table

Product_id, sku_number, total_number


Single service logic:


Add_to_cart: do db txn with 2 modification, and commit


For transaction: mysql. May have problem, the size may be too big


15:00

A truck coming in with a box of banana.  There is a person with scanner.


What does “add product” do?  Yes. we can call add product for the above situation


If we have never sold iPad, now we start to sell iPad.  


Maybe register product. But not sure distinction


User_purchase_table

product_table


Sharding with MySQL can be hard

Instead of one transaction to deal with 2 tables, we can do something different.


Server logic:

Modify product_table if products number >= N

If server logic succeed:

Modify user_purchase_table, then return success

Else:

Return true


Q: When you shard it. If it’s same db is fine to use transaction support.

Now we are using distributed transaction.


A: Break transaction into 2 phases.


Various crash scenarios, because data relevant to the transaction are separate in 2 databases


We have idempotent key for user behavior.  If first step succeed, second step may fail


Idempotent key has table: idempotent_key, status


If we need to give up after we reserve banana.  How do we rollback?


  1. Have global key for idempotent key

  2. Chron job to periodically check: consistency between idempotent_table and product_table

Treat as idemponent key: unique id: lock N products when number of products equal>=N. CAS

Are all APIs on the same server? Who is calling those APIs?


If there are many stores, and they are busy.  Talk me through sharding


Most important is the product_table


How do you shard?

We can shard based on category and product name



Downside: read-write heavy. Not good for hot products


We can cache the product



Interviewer and Audience Feedback


Audience feedback


Inventory management for grocery

Instacart: cannot sell more than the inventory.

Consistent.



Senior:

Leading the discussion

Which points are most important

Important points:

  • Managing inventory

  • Abandoned cart, expiry.  A hidden requirement


Transaction - consistency

Possible:

  • Distributed transaction

  • 2PC, saga

  • Idempotent

  • Can proactively provide the possible solutions

  • Roll-forward, rollback.

  • Touched on the points

  • Most of these are based on data.  Today, we are missing data estimation

  • Everybody has reservation.  How many update.  What’s the granularity update?  Can we use in-memory database? 

  • Which one is better?

  • Relational database.  Every store, every item, may not have a lot of entries.

  • We can shard on store.

  • Using business requirement to optimize


Shopper fulfillment and inventory

  • Shopper: more API. 

  • Should use picture


Shipment acceptance:

  • spike 

  • There are multiple solutions


Some may be very specific. But big companies just look at big picture but not specific skills


Requirement gathering.  We can simplify the solution, senior should drive simplification

Finding the key points of discussion.  QPS. storage. Hit rate, locality. Prove your own design.  

Present the big picture.  Search can be separated from this system. 


Weak pass for intermediate level


Interviewee


Q: Difference between adding a new item, vs adding a new product.

A: Add item to category is separate from add item to inventory.


Q: sharding. We can use SQL. For distributed transaction, how do we do?

A: NoSQL.

We can do distributed transaction. E.g. 

Orchestrator for distributed transaction

Complexity in team.

2-phase commit is relatively slow

QPS is not going to be very high


SAGA:  transaction log, decide where we are. Roll-forward or roll-backward.

Need to write transaction log. 

Inventory table - need versioning.


Need orchestrator, or transaction log.


Justify relation.  But we should do some calculation

Otherise, we can just subtract number from everyone’s cart

扬长避短


Distributed transaction, 2PC, SAGA


There may be service frontend for two services.


Product requirement

Why do we need to lock

Sometimes we may use substitute

Black store: we have more control.  Substitution - product impact

This is more related to black store / warehouse


Add cart, reserve.


Interviewee:

A: How to do cache?

B: traffic pattern, read/write ratio, locality, hot/cold data

Justification. 

Write-aside, write through

Which system writes to cache.

Invalidate.  Can provide more details


Interviewee:

Biggest impression: distributed transaction is hard.  We can take out some framework

We can provide some transaction methods

E.g. dynamoDB can provide distributed transactions


Most popular:

Saga


Assume distributed transaction can solve this.

If we don’t use RDBMS, what can we discuss?

We can discuss expiry.  In-memory.  We don’t have to compute the total number.


You can lead the discussion as a senior engineer.


2PC, spanner.

Before diving to details,draw the big picture

Need to draw UI


Should draw the UI

Bookkeeping are important


Dive too deep into transactions


Interviewer: thinks mysql can solve the problem

Sharding, replication, index.


Q: distributed transaction

2PC vs saga

It’s a design pattern.  Roll forward, rollback.

It does not guarantee isolation.  Temporarily the data is not right.


When we buy things, do we save this to backend cache?

Load balance: backend server.  How do we memorize these?


We ideally want to set up stateless server

Redis, elastic cache.  

Redis: single instance is the strongest.  If you need durability, you can use replicated/distributed redis.


Single master, single instance, quorum, paxos, raft.  Leader election mechanism.


Is it cache or storage? Redis is quite durable.


If we use RDBMS, then things are simple.

If there is no locality and miss rate is high, then cache is not useful.


Sticky routing: fairly rare.


Complexity, hardware cost, team cost. 

We can use long polling.  We don’t have to use realtime feedback.  Over-engineering


Cache or database.

High speed cache.

All carts have expiration.

In-memory database + replication (providing durability)


Every shop does not have a lot of people, so we can calculate in realtime.


Cache -> roadmap with transaction


=

Put all order in cart

userID + cart

Cart may have item A

Cart -> split into user, item, store, cart, reservation number. Multiple entries.

Reservation count storage

We don’t need to worry about expiration.


TTL.  How to add the reserved number?

If we see TTL expired, we can remove the item, and not add the count


Alternatively, we can use a sleeper to poll. Add the things back.



Reservation table, primary key?

Store, SKU, cart ID, product ID, reservation count

Primary key is concatenation of the 4

Cart: should add based on store, SKU.  Range scan.

Black warehouse.  Popular items.  Say apple is most popular.


Can we set store, sku as primary key, the rest are repeated.

It makes things more complex: Normalization, denomalization. Locking, optimistic locking.


DynamoDB is expensive: you may as well use RDBMS. RDBMS can be tuned as well.


==

If we store and shard by store, then retrieval is hard

You need to go through the shards.

We can materialize view.  Enrichment.  Precalculate the additional information.


Can we use cart as primary key?

Cart + map

Cart + reservation number


Secondary index: equivalent of more tables


Global secondary index: internally there are 2 tables. It’s eventual consistency.  It uses distributed transaction.


如何准备?

DDIA - backburner

Alex Xu - 2 books

Tech Dummies - it gives a brief introduction

Come to mock

以面代考


Pattern of distributed system

DDIA

New book - stock exchange, payment, google map

If we hit a product that you have not used.


If you don’t know the design at all:

Start from user, end with user

Big picture design

Find the things that I am familiar with, and discuss about it. E.g. how to split the microservice


15 companies

Mock, listen, read, real interview


Grokking: relatively shallower

Alex Xu: deeper, but still some area not deep enough

Mock with each other: can get real architecture


==


Excalidraw, googledocs

Multiple cameras

Open tethering

Excel

Work from alpha

Open your IDE


Debugging, coding

Coding more 

After SDE II, SDE III 

Example: organization level. Executive. Data point

Cross functional. L6


Cross functional


Situation of conflict, example requirements of the person from the other side, how we pull the other side back.




No comments:

Post a Comment