April 22, 2022
I think that I made a good choice to purchase the course. I love learning from a mock interview section:
Section 11: Design a web crawler (aka Google Crawler).
I was asked to work on a system design to design a web crawler a few years ago. I always search good ideas to solve this system design question.
Current design | Uniqueness checker
Bloom filter | uniqueness design talk
- Memory requirements
- 1.58 sites, if each site has 10 pages on average
- 50GB of RAM
- False positives
Key-value store? | uniqueness design talk
- Average URL = 50 bytes
- 15B URLs
- = 750,000,000,000 bytes
- = 732,000,000 KB
- = 715,000 MB
- = 700 GB
Plain old DB?
Summary | Three choices: Bloom filters, Redis, RDBMS
High throughput, weak consistency - Bloom filters
Medium throughput, medium consistency - Redis
Low throughput, high consistency - RDBMS
No comments:
Post a Comment