Wednesday, August 11, 2021

Bigtable in action (Google Cloud Next '17) | NoSQL best self-learning course

 Aug. 11, 2021

Here is the link. 

Google's billion-user services like Gmail and Google Maps depend on Bigtable to store data at massive scale and retrieve data with ultra low-latency. Today, many use cases such as IoT, finance, mapping, advertising and dealing with time-series data, face similar demands. In this video, you'll learn how to integrate Cloud Bigtable into your application architecture to solve the challenges of storing and retrieving data for these and other use cases. We'll cover the specifics of the Cloud Bigtable service and also dive into schema level design considerations. You'll also hear how customers have successfully leveraged Bigtable to solve their problems at scale and the patterns they've implemented on top of Cloud Bigtable. Missed the conference? Watch all the talks here: https://goo.gl/c1Vs3h Watch more talks about Infrastructure & Operations here: https://goo.gl/k2LOYG

My notes 

Google research in data technologies 

  1. 2002, GFS 
  2. 2004, MapReduce
  3. 2006, Bigtable
  4. 2008, Dremel, 
  5. 2010 - 2011, Colossus, Flume, Megastore
  6. 2012, Spanner
  7. Millwheel
  8. 2013, PubSub, F1
Data model & Usage 
  • NoSQL (no-join) distributed key-value store, designed to scale-out
  • has only one index (the row-key)
  • supports atomic single-row transactions
  • unwritten cells in do not take up any space

row key - 
sparse - Do not worry about wasting space 

Cells
  • every cell is versioned (default is timestamp on server)
  • garbage collection retains latest version (configurable)
  • expiration (optional) can be set at column-family level
  • periodic compaction relations unused space from cells
Writing / reading 

Writing 
  • Put
  • Increment
  • Append
  • Conditional updates
  • Bulk import
Reading 
  • Gets
  • Range scan
  • Filter
  • Full scan
  • Export
Schema design 

Rows 
  • updates are atomic - but only at the row level
  • store items related to a given entity in a single row
  • where atomic updates aren't needed and entity is large - then split it up
  • rows are sorted lexicographically by row-key
  • store related entities in adjacent rows
Row-key
  • determine a key strategy that facilitates common queries
  • choose keys that help distribute reads/writes and avoids hotspots
  • avoid solely monotonically increasing keys (timestamp or sequence)
  • a combined-key strategy is helpful
Tall vs, wide tables 

DBYO: don't build your own

abstractions and integration already exist for popular use cases:

  • JanusGraph - graph database
  • OpenTSDB - time-series database
  • Spotify/Heroic time-series database
  • GeoMesa geospatial querying 
Integration with Cloud Dataflow
Integration with Cloud Dataproc 

Server Density - Founder and CEO data availability 25:00 -> 


Sami Zuhuruddin 

https://www.linkedin.com/in/samizuh/








No comments:

Post a Comment