Thursday, September 16, 2021

Cloud bigtable > Documentation > Guides: Writes | My 30+ minutes study

Sept. 16, 2021

Introduction

I just could not believe that I enjoy so much about NoSQL learning. I chose to study Google cloud BigTable, and I was so surprised to learn that I should have studied these over 10 years ago. 

Writes | Google cloud Bigtable document

I just copy the content from the webpage here. I like to highlight things for first time reader like me. 

Writes

 

This page lists the types of write requests you can send to Cloud Bigtable and describes when you should use them and when you should not.

The Cloud Bigtable Data API and client libraries allow you to programmatically write data to your tables. Bigtable sends back a response or acknowledgement for each write.

Each client library offers the ability to send the following types of write requests:

  • Simple writes
  • Increments and appends
  • Conditional writes
  • Batch writes

Bigtable client libraries have a built-in smart retries feature for simple and batch writes, which means that they seamlessly handle temporary unavailability. For example, If your application attempts to write data and encounters a temporary outage or network issue, it automatically retries until the write is committed or the request deadline is reached. This resilience works with both single-cluster and multi-cluster instances, with single-cluster routing or multi-cluster routing.

For batch and streaming write operations, you can use the Dataflow connector for Bigtable.

Examples of each type of write are available for each Bigtable client library.

All write requests include the following basic components:

  • The name of the table to write to.
  • An app profile ID, which tells Bigtable how to route the traffic.
  • One or more mutations. A mutation consists of four elements:
    • Column family name.
    • Column qualifier.
    • Timestamp.
    • Value you are writing to the table.

The timestamp of a mutation has a default value of the current date and time. All mutations in a single write request have the same timestamp unless you override them. You can set the timestamp of all mutations in a write request to be the same or different from each other.

You can write a single row to Bigtable with a MutateRow request that includes the table name, the ID of the app profile that should be used, a row key, and up to 100,000 mutations for that row. A single-row write is atomic. Use this type of write when you are making multiple mutations to a single row.

For code samples that demonstrate how to send simple write requests, see Performing a simple write.

Simple writes are not the best way to write data for the following use cases:

  • You are writing a batch of data that will have contiguous row keys. In this case, you should use batch writes instead of consecutive simple writes, because a contiguous batch can be applied in a single backend call.

  • You want high throughput (rows per second or bytes per second) and don't require low latency. Batch writes will be faster in this case.

If you want to append data to an existing value or increment an existing numeric value, submit a ReadModifyWriteRow request. This request includes the table name, the ID of the app profile that should be used, a row key, and a set of rules to use when writing the data. Each rule includes the column family name, column qualifier, and either an append value or an increment amount.

Rules are applied in order. For example, if your request includes a request to increment the value for a column by two, and a later rule in the same request increments that same column by 1, the column is incremented by 3 in this single atomic write. The later rule does not overwrite the earlier rule.

A value can be incremented only if it is encoded as a 64-bit big-endian signed integer. Bigtable treats an increment to a value that is empty or does not exist as if the value is zero. ReadModifyWriteRow requests are atomic. They are not retried if they fail for any reason.

For code samples that demonstrate how append a value in a cell, see Incrementing an existing value.

You should not send ReadModifyWriteRow requests in the following situations:

  • You are using an app profile that has multi-cluster routing.

  • You are using multiple single-cluster app profiles and sending writes that could conflict with data written to the same row and column in other clusters in the instance. With single-cluster routing, a write request is sent to a single cluster and then replicated.

  • You rely on the smart retries feature provided by the client libraries. Increments and appends are not retriable.

  • You are writing large amounts of data and you need the writes to complete quickly. A request that reads and then modifies a row is slower than a simple write request. As a result, this type of write is often not the best approach at scale. For example, if you want to count something that will number in the millions, such as page views, you should consider recording each view as a simple write rather than incrementing a value. Then you can use a Dataflow job to aggregate the data.

If you want to check a row for a condition and then, depending on the result, write data to that row, submit a CheckAndMutateRow request. This type of request includes a row key and a row filter. A row filter is a set of rules that you use to check the value of existing data. Mutations are then committed to specific columns in the row only when certain conditions, checked by the filter, are met. This process of checking and then writing is completed as a single, atomic action.

A filter request must include one or both of two types of mutations:

  • True mutations, or the mutations to apply if the filter returns a value.
  • False mutations, which are applied if the filter yields nothing.

You can supply up to 100,000 of each type of mutation--true and false--in a single write, and you must send at least one. Bigtable sends a response when all mutations are complete.

For code samples that demonstrate how to send conditional writes, see Conditionally writing a value.

You cannot use conditional writes for the following use case:

  • You are using an app profile that has multi-cluster routing.

  • You are using multiple single-cluster app profiles and sending writes that could conflict with data written to the same row and column in other clusters in the instance. With single-cluster routing, a write request is sent to a single cluster and then replicated.

You can write more than one row with a single call by using a MutateRows request. MutateRows requests contain a set of up to 100,000 entries that are each applied atomically. Each entry consists of a row key and at least one mutation to be applied to the row. A batch write request can contain up to 100,000 mutations spread across all entries. For example, a batch write could include any of the following permutations:

  • 100,000 entries with 1 mutation in each entry.
  • 1 entry with 100,000 mutations.
  • 1,000 entries with 100 mutations each.

Each entry in a MutateRows request is atomic, but the request as a whole is not. Bigtable sends a response when all entries have been written.

For code samples that demonstrate how to send batch writes, see Performing batch writes.

  • You are writing bulk data to rows that are not close to each other. Bigtable stores data lexicographically by row key, the binary equivalent of alphabetical order. Because of this, when row keys in a request are not similar to each other, Bigtable handles them sequentially, rather than in parallel. The throughput will be high, but latency will also be high. To avoid that high latency, use MutateRows when row keys are similar and Bigtable will be writing rows that are near each other. Use MutateRow, or simple writes, for rows that are not near each other.

  • You are requesting multiple mutations to the same row. In this case, you will see better performance if you perform all the mutations in a single simple write request. This is because in a simple write, all changes are committed in a single atomic action, but a batch write is forced to serialize mutations to the same row, causing latency.

The time it takes for the data that you write to be available for reads depends on several factors, including the number of clusters in your instance and the type of routing that your app profile uses. With a single-cluster instance, the data can be read immediately, but if an instance has more than one cluster, meaning it's using replication, Bigtable is eventually consistent. You can achieve read-your-writes consistency by routing requests to the same cluster.

You can create and use a consistency token after you've sent write requests. The token checks for replication consistency. In general, you create a consistency token either after a batch of writes has been sent or after a certain interval, such as an hour. Then you can hand the token off to be used by another process, such as a module making a read request, which uses the token to check to make sure all the data has been replicated before it attempts to read.

If you use a token right after you create it, it can take up to a few minutes to check for consistency the first time you use it. This delay is because every cluster checks every other cluster to make sure no more data is coming. After the initial use, or if you wait several minutes to use the token for the first time, the token succeeds immediately every time it's used.

Each cell value in a Bigtable table is uniquely identified by the four-tuple (row key, column family, column qualifier, timestamp). In the rare event that two writes with the exact same four-tuple are sent to two different clusters, Bigtable automatically resolves the conflict using an internal last write wins algorithm based on the server-side time. The Bigtable "last write wins" implementation is deterministic, and when replication catches up, all clusters have the same value for the four-tuple.

No comments:

Post a Comment