Julia's coding blog - Practice makes perfect

From January 2015, she started to practice leetcode questions; she trains herself to stay focus, develops "muscle" memory when she practices those questions one by one. 2015年初, Julia开始参与做Leetcode, 开通自己第一个博客. 刷Leet code的题目, 她看了很多的代码, 每个人那学一点, 也开通Github, 发表自己的代码, 尝试写自己的一些体会. She learns from her favorite sports – tennis, 10,000 serves practice builds up good memory for a great serve. Just keep going. Hard work beats talent when talent fails to work hard.

Wednesday, August 11, 2021

Bigtable in action (Google Cloud Next '17) | NoSQL best self-learning course

Aug. 11, 2021

Here is the link.

Google's billion-user services like Gmail and Google Maps depend on Bigtable to store data at massive scale and retrieve data with ultra low-latency. Today, many use cases such as IoT, finance, mapping, advertising and dealing with time-series data, face similar demands. In this video, you'll learn how to integrate Cloud Bigtable into your application architecture to solve the challenges of storing and retrieving data for these and other use cases. We'll cover the specifics of the Cloud Bigtable service and also dive into schema level design considerations. You'll also hear how customers have successfully leveraged Bigtable to solve their problems at scale and the patterns they've implemented on top of Cloud Bigtable. Missed the conference? Watch all the talks here: https://goo.gl/c1Vs3h Watch more talks about Infrastructure & Operations here: https://goo.gl/k2LOYG

My notes

Google research in data technologies

2002, GFS
2004, MapReduce
2006, Bigtable
2008, Dremel,
2010 - 2011, Colossus, Flume, Megastore
2012, Spanner
Millwheel
2013, PubSub, F1

Data model & Usage

NoSQL (no-join) distributed key-value store, designed to scale-out
has only one index (the row-key)
supports atomic single-row transactions
unwritten cells in do not take up any space

row key -

sparse - Do not worry about wasting space

Cells

every cell is versioned (default is timestamp on server)
garbage collection retains latest version (configurable)
expiration (optional) can be set at column-family level
periodic compaction relations unused space from cells

Writing / reading

Writing

Put
Increment
Append
Conditional updates
Bulk import

Reading

Gets
Range scan
Filter
Full scan
Export

Schema design

Rows

updates are atomic - but only at the row level
store items related to a given entity in a single row
where atomic updates aren't needed and entity is large - then split it up
rows are sorted lexicographically by row-key
store related entities in adjacent rows

Row-key

determine a key strategy that facilitates common queries
choose keys that help distribute reads/writes and avoids hotspots
avoid solely monotonically increasing keys (timestamp or sequence)
a combined-key strategy is helpful

Tall vs, wide tables

DBYO: don't build your own

abstractions and integration already exist for popular use cases:

JanusGraph - graph database
OpenTSDB - time-series database
Spotify/Heroic time-series database
GeoMesa geospatial querying

Integration with Cloud Dataflow

Integration with Cloud Dataproc

Server Density - Founder and CEO data availability 25:00 ->

Sami Zuhuruddin

https://www.linkedin.com/in/samizuh/

Julia's coding blog - Practice makes perfect

Wednesday, August 11, 2021

Bigtable in action (Google Cloud Next '17) | NoSQL best self-learning course

My notes

No comments:

Post a Comment