Tuesday, August 11, 2020

Netflix tech talk: You Won't Believe How the Biggest Sites Build Scalable and Resilient Systems!

 Here is the link. 

First 28 minutes, sharding, consistent hashing, hotspot, going multi-zone, Cassandra. 

19:50

Sharding

- split writes across master databases

- Each can have a salve, some many slaves based on workload

- One can avoid reading from the master if possible

- Picking the sharing key well is essential and fraught with peril

Second class users

- logged out users get cached content

- CDN bears the brunt of the traffic

Building a data model

- what questions you want to ask your dta?

- Don't try and normalize anything

- Instead of changing a value keep a record of what happened

Data schemas 

- Unless you are really really sure of your business model...

- The less schema the better

- reddit's database is literally just keys and values, despite being in Postgress

28:00 - viewing data, second person to present architecture - Experience evolution, 14 minutes videos. 

I like to watch those 14 minutes 10 times. I need to get ideas how to talk about system design related to this streaming data service, how to scale etc. 

Why Cassandra?

- Availability over consistency

- Writes over reads

- We know Java

- Open source + support


Subscribers - 

Virtuous cycle - viewing -> improved personalization - 

Viewing data 

who, what, when, where, how long 

Real time data use cases

What have I watched?

Where was I?
What else am I watching?

Session Analytics - 

buffering, quality - Session analytics 

Generic architecture 

Architecture evolution

Sessions - Oracle database - scale up - no scale out - ad hoc ... painful, not evolved 

Real time data - gen 2 mintivation

Scalability - scale out not up 

Viewing service - 50 data partitions 

Scale out - resharding was painful 

Performance - hot spots 

Disaster recovery 

NoSQL - > MemCache, Cassandra - 

gen 3 motivation

order of magnitude - include ... 

Write / read stateful tier - active sessions, latest positions, View summary - > sanpshot, viewing history Memcached 

Access - ...

gen 3 - requests scale 

Real time data - redistributed...

Stateless microservcies



No comments:

Post a Comment