Amazon DynamoDB – Tailenders Technologies

DynamoDB is a fully managed NoSQL database service that supports key-value and document data. Supported document formats are JSON, HTML and XML. DynamoDB data will be stored on SSD. It spreads across 3 geographically distinct data centres. DynamoDB transactions provide the ability to perform ACID transactions. DynamoDB consists of tables, items and attributes. It might be a good choice when we try to design a serverless application. If you want to get some basic understanding of serverless, please read this post. It integrates well with Lambda. It can be configured to automatically scale. Below are some key highlights,

Fast and Flexible NoSQL Database.Single digit millisecond latency at any scale.
Highly available with replication across multiple Available Zones(AZ)
Integrated with IAM for security, authorization and administration
Enables event driven programming with DynamoDB streaming
Have Time To Live(TTL) feature to set expiry for the data. Once expired, the item is marked for deletion.

Some key things to remember,

Primary key must be decided at creation time
Each table can have infinite number of rows
Max size of an item is 400KB

Primary Keys

DynamoDB stores and retrieves data based on a primary key. It has two types of primary keys,

Partition Key
Composite key (partition key + sort key)

Partition Key
It’s a Unique attribute. Value of the partition key is input to an internal hash function which determines the partition or physical location on which the data is stored.
Composite key
This is a combination of partition key & sort key.
We can use composite key as a primary key, when there is no need to keep the partition key unique. For example, User post multiple message in a some social media or in some forums, where userId(Partition key) & timestamp(Sort key) are the primary key(Composite key).
All items with the same partition key are stored together and then sorted according to the sort key.

IAM in DynamoDB

Authentication & access control is managed using AWS IAM.
IAM Permissions – can create IAM users within our AWS account with specific permissions to access & create DynamoDB tables.
IAM Roles – can create IAM roles & enable temporary access to DynamoDB.
Special IAM condition – We can also use a special IAM condition to restrict user access to only their own records. This will give a fine grained level of access. This will be done by IAM condition parameter “dynamodb:LeadingKeys”

...
...
"Condition": {
 "ForAllValues:StringEquals" : {
  "dynamodb:LeadingKeys" [
   "${www.mygame.com:user_id}"
  ]
 }
}
...
...

Indexes

Indexes will help to query based on an attribute that is not the primary key. DynamoDB allows us to run a query on non-primary key attributes using global secondary indexes and local secondary indexes.A secondary index allow us to perform fast queries on specific columns in a table.

Local secondary index – Same partition key as your original table but a different sort key. It can be created, only when we create table. We can add, remove or modify it later.
Global secondary index – Different partition key an different sort key. We can create whenever we want.

Scan vs Quey API call

Query
A query finds items in a table based on the primary key. We can use an optional sort key name and value to refine the results.
Results are always sorted by the sort key, in ascending numeric order by default.We can reverse the order by setting the ScanIndexForward parameter to false. By default, queries are eventually consistent. We need to explicitly set the query to be strongly consistent.
Scan
A scan operation examines every item in the table. So try to avoid using Scan operations. Design the tables in a way that you can use the Query, Get or BatchGetItem APIs.

By default, it returns all data attributes. We can apply filter on top of the data we received & see only the data we want. Use the ProjectionExpression parameter to refine the scan to only return the attributes we want. For example, if we only want to see the email address rather than all the attributes.

Points to remember to improve performance

Set a smaller page in size. Running a larger number of smaller operations will allow other requests to succeed without throttling.
By default, Scan operation processes data sequentially, returning 1MB increments before moving on to retrieve the next 1MB of data. Scan one partition at a time. We can configure DynamoDB to use paralell scans instead, by logically dividing a table or index into segments and scanning each segment in parallel.
It is best to avoid parallel scans if a table or index is already on heavy read or write activity from other applications.
Isolate scan operations to specific tables and segregate them from the mission-critical traffic.

Read/Write capacity mode

DynamoDb has two read/write capacity modes for processing read and write on the tables. If you want to read more about this, please visit AWS docs.

On-demand
Provisioned(default, free-tier eligible)

On-demand mode

For on-demand mode tables, we don’t need to specify how much read and write throughput we expect application to perform. Charges are based on read request units & write request units.

Read Request Unit

One read request unit represents one strongly consistent read request, or two eventually consistent read requests, for an item up to 4 KB in size.
Two read request units represent one transactional read for items up to 4 KB.

Write Request Unit

One write request unit represents one write for an item up to 1 KB in size.
Transactional write requests require 2 write request units to perform one write for items up to 1 KB.

Provisioned mode

If we choose provisioned mode, we have to specify the number of reads and writes per second that you require for the application. We can also use auto scaling to adjust table’s provisioned capacity automatically in response to traffic changes. We have to specify throughput capacity in terms of read capacity units (RCUs) and write capacity units (WCUs).

Read Capacity Unit

One read capacity unit represents one strongly consistent read per second, or two eventually consistent reads per second, for an item up to 4 KB in size.
Transactional read requests require two read capacity units to perform one read per second for items up to 4 KB.

Write Capacity Unit

One write capacity unit represents one write per second for an item up to 1 KB in size.
Transactional write requests require 2 write capacity units to perform one write per second for items up to 1 KB.

DynamoDB Accelerator(DAX)

It is a fully managed, clustered in-memory cache for DynamoDB.By Default, 5 min TTL for cache. We can have upto 10 nodes in the cluster. We can have multi-AZ setup.

DAX vs ElasticCache

DAX can cache mostly DB related objects, query & scans. ElasticCache will be used to store data from application. For example, we can compute the data retrived from the DB & stored it in ElasticCache.

DynamoDB Streams

DynamoDB Streams captures a time-ordered sequence of item-level modifications in any DynamoDB table and stores this information in a log for up to 24 hours.
A DynamoDB stream is an ordered flow of information about changes to items in a DynamoDB table.
Stream records can be stroed to Kinesis Data Streams.They can be read by AWS Lambda & Kinesis Client Library applications.

We can get the below information from the streams,

KEYS_ONLY – Only the key attributes of the modified item.
NEW_IMAGE – the entire item, as modified.
OLD_IMAGE – the entire item, as before modifies.
NEW_AND_OLD_IMAGE – both new & old images of the item.

DynamoDB Streams are made of shards, just like Kinesis Data Streams, But it’s automated by AWS. We don’t need to do anything for shards. Streams will be accessed by a dedicated endpoint.

Error Handling with DynamoDB

ProvisionedThroughputExceededException error means the number of requests is too high.When the request rate is too high for the read/write capacity provisioned on your DynamoDB table, will get this exception. If we use AWS SDK, it will automatically retry the request untill sucessful. AWS SDK uses exponential backoff algorithm for better flow control.

The concept behind exponential backoff is to use progressively longer waits between retries for consecutive error responses.For example, up to 50 milliseconds before the first retry, up to 100 milliseconds before the second, up to 200 milliseconds before third, and so on. However, after a minute, if the request has not succeeded, the problem might be the request size exceeding your provisioned throughput, and not the request rate.

Set the maximum number of retries to stop around one minute. If the request is not successful, investigate your provisioned throughput options.