Ming Dao School conducts system design mock interviews online every Sunday. You are welcome to join us.
Here is the link.
March 27, 2022
Mock interview - Senior engineer - system design
Requirements
Functional requirements
Inventory management like instacart
Make sure we don’t overshop
Shipment coming in
Browse, add to cart, at the same time update availability, no over selling
Out of scope
No payment, shipment
Inventory service, we already have item in
Neighboring team - catalog team. Already handle catalog
Manage inventory and basket
0. Incoming shipment to add to inventory
1. User can browse, search. Search and browse for bare minimum
2. Add to cart, keep something (lock/reserve)
(Really want to focus on inventory management)
Non functional requirements
Availability
High throughput
Service reliability
System Design
10:00
External APIs
API:
add_product(product, number)
Add_to_cart(user, product, number)
checkout(user, product, number)
11:00
Database design
User_purchse_table
User_id, product_id, number, status
Status: in_cart, bought
Product_table
Product_id, sku_number, total_number
Single service logic:
Add_to_cart: do db txn with 2 modification, and commit
For transaction: mysql. May have problem, the size may be too big
15:00
A truck coming in with a box of banana. There is a person with scanner.
What does “add product” do? Yes. we can call add product for the above situation
If we have never sold iPad, now we start to sell iPad.
Maybe register product. But not sure distinction
User_purchase_table
product_table
Sharding with MySQL can be hard
Instead of one transaction to deal with 2 tables, we can do something different.
Server logic:
Modify product_table if products number >= N
If server logic succeed:
Modify user_purchase_table, then return success
Else:
Return true
Q: When you shard it. If it’s same db is fine to use transaction support.
Now we are using distributed transaction.
A: Break transaction into 2 phases.
Various crash scenarios, because data relevant to the transaction are separate in 2 databases
We have idempotent key for user behavior. If first step succeed, second step may fail
Idempotent key has table: idempotent_key, status
If we need to give up after we reserve banana. How do we rollback?
Have global key for idempotent key
Chron job to periodically check: consistency between idempotent_table and product_table
Treat as idemponent key: unique id: lock N products when number of products equal>=N. CAS
Are all APIs on the same server? Who is calling those APIs?
If there are many stores, and they are busy. Talk me through sharding
Most important is the product_table
How do you shard?
We can shard based on category and product name
Downside: read-write heavy. Not good for hot products
We can cache the product
Interviewer and Audience Feedback
Audience feedback
Inventory management for grocery
Instacart: cannot sell more than the inventory.
Consistent.
Senior:
Leading the discussion
Which points are most important
Important points:
Managing inventory
Abandoned cart, expiry. A hidden requirement
Transaction - consistency
Possible:
Distributed transaction
2PC, saga
Idempotent
Can proactively provide the possible solutions
Roll-forward, rollback.
Touched on the points
Most of these are based on data. Today, we are missing data estimation
Everybody has reservation. How many update. What’s the granularity update? Can we use in-memory database?
Which one is better?
Relational database. Every store, every item, may not have a lot of entries.
We can shard on store.
Using business requirement to optimize
Shopper fulfillment and inventory
Shopper: more API.
Should use picture
Shipment acceptance:
spike
There are multiple solutions
Some may be very specific. But big companies just look at big picture but not specific skills
Requirement gathering. We can simplify the solution, senior should drive simplification
Finding the key points of discussion. QPS. storage. Hit rate, locality. Prove your own design.
Present the big picture. Search can be separated from this system.
Weak pass for intermediate level
Interviewee
Q: Difference between adding a new item, vs adding a new product.
A: Add item to category is separate from add item to inventory.
Q: sharding. We can use SQL. For distributed transaction, how do we do?
A: NoSQL.
We can do distributed transaction. E.g.
Orchestrator for distributed transaction
Complexity in team.
2-phase commit is relatively slow
QPS is not going to be very high
SAGA: transaction log, decide where we are. Roll-forward or roll-backward.
Need to write transaction log.
Inventory table - need versioning.
Need orchestrator, or transaction log.
Justify relation. But we should do some calculation
Otherise, we can just subtract number from everyone’s cart
扬长避短
Distributed transaction, 2PC, SAGA
There may be service frontend for two services.
Product requirement
Why do we need to lock
Sometimes we may use substitute
Black store: we have more control. Substitution - product impact
This is more related to black store / warehouse
Add cart, reserve.
Interviewee:
A: How to do cache?
B: traffic pattern, read/write ratio, locality, hot/cold data
Justification.
Write-aside, write through
Which system writes to cache.
Invalidate. Can provide more details
Interviewee:
Biggest impression: distributed transaction is hard. We can take out some framework
We can provide some transaction methods
E.g. dynamoDB can provide distributed transactions
Most popular:
Saga
Assume distributed transaction can solve this.
If we don’t use RDBMS, what can we discuss?
We can discuss expiry. In-memory. We don’t have to compute the total number.
You can lead the discussion as a senior engineer.
2PC, spanner.
Before diving to details,draw the big picture
Need to draw UI
Should draw the UI
Bookkeeping are important
Dive too deep into transactions
Interviewer: thinks mysql can solve the problem
Sharding, replication, index.
Q: distributed transaction
2PC vs saga
It’s a design pattern. Roll forward, rollback.
It does not guarantee isolation. Temporarily the data is not right.
When we buy things, do we save this to backend cache?
Load balance: backend server. How do we memorize these?
We ideally want to set up stateless server
Redis, elastic cache.
Redis: single instance is the strongest. If you need durability, you can use replicated/distributed redis.
Single master, single instance, quorum, paxos, raft. Leader election mechanism.
Is it cache or storage? Redis is quite durable.
If we use RDBMS, then things are simple.
If there is no locality and miss rate is high, then cache is not useful.
Sticky routing: fairly rare.
Complexity, hardware cost, team cost.
We can use long polling. We don’t have to use realtime feedback. Over-engineering
Cache or database.
High speed cache.
All carts have expiration.
In-memory database + replication (providing durability)
Every shop does not have a lot of people, so we can calculate in realtime.
Cache -> roadmap with transaction
=
Put all order in cart
userID + cart
Cart may have item A
Cart -> split into user, item, store, cart, reservation number. Multiple entries.
Reservation count storage
We don’t need to worry about expiration.
TTL. How to add the reserved number?
If we see TTL expired, we can remove the item, and not add the count
Alternatively, we can use a sleeper to poll. Add the things back.
Reservation table, primary key?
Store, SKU, cart ID, product ID, reservation count
Primary key is concatenation of the 4
Cart: should add based on store, SKU. Range scan.
Black warehouse. Popular items. Say apple is most popular.
Can we set store, sku as primary key, the rest are repeated.
It makes things more complex: Normalization, denomalization. Locking, optimistic locking.
DynamoDB is expensive: you may as well use RDBMS. RDBMS can be tuned as well.
==
If we store and shard by store, then retrieval is hard
You need to go through the shards.
We can materialize view. Enrichment. Precalculate the additional information.
Can we use cart as primary key?
Cart + map
Cart + reservation number
Secondary index: equivalent of more tables
Global secondary index: internally there are 2 tables. It’s eventual consistency. It uses distributed transaction.
如何准备?
DDIA - backburner
Alex Xu - 2 books
Tech Dummies - it gives a brief introduction
Come to mock
以面代考
Pattern of distributed system
DDIA
New book - stock exchange, payment, google map
If we hit a product that you have not used.
If you don’t know the design at all:
Start from user, end with user
Big picture design
Find the things that I am familiar with, and discuss about it. E.g. how to split the microservice
15 companies
Mock, listen, read, real interview
Grokking: relatively shallower
Alex Xu: deeper, but still some area not deep enough
Mock with each other: can get real architecture
==
Excalidraw, googledocs
Multiple cameras
Open tethering
Excel
Work from alpha
Open your IDE
Debugging, coding
Coding more
After SDE II, SDE III
Example: organization level. Executive. Data point
Cross functional. L6
Cross functional
Situation of conflict, example requirements of the person from the other side, how we pull the other side back.
No comments:
Post a Comment