Here is the link.
First 28 minutes, sharding, consistent hashing, hotspot, going multi-zone, Cassandra.
19:50
Sharding
- split writes across master databases
- Each can have a salve, some many slaves based on workload
- One can avoid reading from the master if possible
- Picking the sharing key well is essential and fraught with peril
Second class users
- logged out users get cached content
- CDN bears the brunt of the traffic
Building a data model
- what questions you want to ask your dta?
- Don't try and normalize anything
- Instead of changing a value keep a record of what happened
Data schemas
- Unless you are really really sure of your business model...
- The less schema the better
- reddit's database is literally just keys and values, despite being in Postgress
28:00 - viewing data, second person to present architecture - Experience evolution, 14 minutes videos.
I like to watch those 14 minutes 10 times. I need to get ideas how to talk about system design related to this streaming data service, how to scale etc.
Why Cassandra?
- Availability over consistency
- Writes over reads
- We know Java
- Open source + support
Subscribers -
Virtuous cycle - viewing -> improved personalization -
Viewing data
who, what, when, where, how long
Real time data use cases
What have I watched?
Where was I?
What else am I watching?
Session Analytics -
buffering, quality - Session analytics
Generic architecture
Architecture evolution
Sessions - Oracle database - scale up - no scale out - ad hoc ... painful, not evolved
Real time data - gen 2 mintivation
Scalability - scale out not up
Viewing service - 50 data partitions
Scale out - resharding was painful
Performance - hot spots
Disaster recovery
NoSQL - > MemCache, Cassandra -
gen 3 motivation
order of magnitude - include ...
Write / read stateful tier - active sessions, latest positions, View summary - > sanpshot, viewing history Memcached
Access - ...
gen 3 - requests scale
Real time data - redistributed...
Stateless microservcies
No comments:
Post a Comment