Exploring AWS Database Options: RDS, DynamoDB, and Redshift Explained

Amazon RDS

  • Easy to set up, operate, and scale a relational database in the cloud.

  • Provides cost-efficient and resizable capacity while automating time-consuming administration tasks such as hardware provisioning, database setup, patching and backups.

  • Makes to focus on your applications so you can give them the fast performance, high availability, security and compatibility they need.

  • Primary use case is a transactional database (rather than analytical)

Why Managed RDS vs On Servers

  • Automated Scaling

  • Easy to administer

  • Highly scalable

  • Available and durable

  • Fast

  • Secure

  • Inexpensive

  • Automated and Manuals Backups

Different types of database engines supported in AWS RDS

  • MySQL

  • Maria DB

  • PostgreSQL

  • Oracle

  • Microsoft SQL Server DB engines

  • Amazon Aurora

Creation of RDS database

To create an RDS database, search for RDS in the AWS search bar and select the first result.

Navigate to Databases from the left navigation menu, and click on Create Database.

Select "Easy create" and choose the engine option. Configure the remaining settings, then click on "Create database."

It will take a few seconds or minutes to create the RDS database.

Manually connecting to EC2 Server

Create an EC2 instance, then connect to the EC2 server.

Follow this link for connection to database : https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_GettingStarted.CreatingConnecting.MySQL.html#CHAP_GettingStarted.Connecting.MySQL

Use the command sudo dnf install mariadb105.

Then use the command: mysql -h endpoint -P 3306 -u admin -p. Here, copy your database endpoint link and paste it in place of endpoint.

As you can see, the database is connected to the EC2 instance.

Database Read replicas

  • When the required read I/O capacity is reached but still more I/O capacity is required for heavy/intensive read applications, RDS read replicas can be helpful

  • A read replica is a replica of the primary database that can be used only for read actions

Multi-AZ Deployment

  • Multi-AZ for RDS provides high availability, data durability, and fault tolerance for DB instances

  • You can select the Multi-AZ option during RDS DB instance launch or modify an existing stand alone RDS instance

  • AWS creates a secondary database in different availability zone in the same region for high availability

  • Not possible to Insert/Update/Select the data to the Secondary (stand-by) RDS database

  • OS patching, System upgrades and DB scaling are done on standby DB first and then on primary

Encrypting Amazon RDS Resources

  • You can encrypt your Amazon RDS DB instances and snapshots at rest by enabling the encryption option for your Amazon RDS DB instances.

  • You can’t disable encryption on an encrypted DB

  • You can not enable encryption for an existing, un-encrypted database instance, but there is an alternate way

    • Create a snapshot of the DB

    • Copy the snapshot and choose to encrypt it during the copy process

    • Restore the encrypted copy into a New DB

Alternative to Amazon RDS

  • If your use case isn’t supported on RDS, you can run databases on Amazon EC2.

  • Consider the following points when considering a DB on EC2:

    • You can run any database you like with full control and ultimate flexibility.

    • You must manage everything like backups, redundancy, patching and scaling.

    • Good option if you require a database not yet supported by RDS, such as SAP HANA.

    • Good option if it is not feasible to migrate to AWS-managed database

In-memory (Cache)

In-memory databases are used for applications that require real time access to data. By storing data directly in memory, these databases provide microsecond latency where millisecond latency is not enough.

Used for: Caching, gaming leaderboards, and real-time analytics.

AWS Offerings:

  • Amazon ElastiCache for Redis

  • Amazon ElastiCache for Memcached

ElastiCache

  • Amazon ElastiCache allows you to seamlessly set up, run, and scale popular open-Source compatible in-memory data stores in the cloud

  • Build data intensive apps or boost the performance of your existing databases by retrieving data from high throughput and low latency in-memory data stores

  • Elasticache can be used if data stores have areas of data that are frequently accessed but seldom updated

    • Additionally, querying a database will always be slower and more expensive than locating a key in a key-value pair cache.

Uses cases

  • Session Stores

  • Gaming

  • Real-Time Analytics

  • Queuing

Features

  • Extreme performance by allowing for the retrieval of information from a fast, managed, in- memory system (instead of reading from the DB itself)

  • Improves response times for user transactions and queries

  • It offloads the read workload from the main DB instances (less I/O load on the DB)

    • It does this by storing the results of frequently accessed pieces of data (or computationally intensive calculations) in-memory
  • Fully managed

  • Scalable

  • Supports two caching engines

    • memcached (is not a Data store [DB], only a cache)

    • Redis can be used as a DB (data store)

Amazon Elasticache for Memcached

  • Is not persistent

  • Can not be used as a data store

  • If the node fails, the cached data (in the node) is lost

  • Ideal front-end for data stores (RDS, DynamoDB…etc)

  • Does not support Multi-AZ failover, replication, nor does it support Snapshots for backup/restore

    • Node failure means data loss
  • You can, however, place your Memcached nodes in different AZs to minimize the impact of an AZ failure and to contain the data loss in such an incident

Use cases

  • Cache contents of a DB

  • Cache data from dynamically generated webpages

Amazon Elasticache for Redis

  • Is persistent, using the snapshot feature

  • At any time, you can restore your data by creating a new Redis cluster and populating it with data from a backup

  • Supports Redis master/slave replication

  • Supports snapshots (automatic and manual) to S3 (managed by AWS)

  • The back up can be used to restore a cluster or to seed a new cluster

  • The back up includes cluster metadata and all data in the cluster

Amazon DynamoDB

  • It is a key-value and document database that delivers single digit millisecond performance at any scale

  • It's a fully managed, multi-region, multi-master database with built-in security, backup and restore, and in-memory caching for internet-scale applications

  • Can handle more than 10 trillion requests per day and support peaks of more than 20 million requests per second

  • Many of the world's fastest growing businesses such as Lyft, Airbnb, and Redfin as well as enterprises such as Samsung, Toyota, and Capital One depend on the scale and performance of DynamoDB to support their mission-critical workloads

Benefits

  • Performance at scale: DynamoDB supports some of the world’s largest scale applications by providing consistent, single-digit millisecond response times at any scale

  • Serverless: there are no servers to provision, patch, or manage and no software to install, maintain, or operate

Use cases

  • Serverless Web Applications

  • Microservices Data Store

  • Mobile Back ends

  • Gaming

  • IOT

Tables

  • Dynamo DB tables are schema less - Which means that neither the attributes nor their data types need to be defined before hand

  • Each item can have its own distinct attributes

  • Dynamo DB does not support

    • Complex relations DB querying or joins

    • Does not support complex transactions

Durability and performance

  • Dynamo DB automatically keep data across three facilities(Datacenters) in a region for High availability and data durability

  • It also partitions your DB over sufficient number of servers according to reads/write capacity

  • Performs automatic failover in case of any failure

  • Dynamo DB runs exclusively on SSD volumes which provides

    • Low latency

    • Predictable performance

    • High I/O’s

Dynamo DB basic Components

Tables

  • Like all other DBs, Dynamo DB stores data in tables

  • A table is a collection of data items

  • Each table can have an infinite number of data items

Items

  • Each table contains multiple data items

  • An data item consists of a primary or composite key and a flexible number of attributes

  • There is no limit to the number of items you can store in a table

Attributes

  • Each item is composed of one or more attributes

  • An attribute consists of the attribute name and a value or a set of values

  • An attribute is a fundamental data element

  • Attributes in Dynamo DB are similar into fields or columns in other database systems

Read Capacity Units

  • One read capacity unit represents one strongly consistent read per second, or two eventually consistent reads per second for an item up to 4KB in size

  • If you need to read an item that is larger than 4 KB, Dynamo DB will need to consume additional read capacity units

  • The total number of read capacity units required depends on the item size, and whether you want an eventually consistent or strongly consistent read.

Write Capacity Units

  • One write capacity unit represents one write per second for an item up to 1 KB in size

  • If you need to write an item that is larger than 1 KB

  • Dynamo DB will need to consume additional write capacity units

  • The total number of write capacity units required depends on the item size

Scalability

  • It provides for a push button scaling on AWS where you can increase the read/write throughput and AWS will go ahead and scale it for you (up or down) without downtime or performance degradations

  • You can scale the provisioned capacity of your Dynamo DB table any time you want

  • There is no limit to the number of items(data) you can store in a Dynamo DB table

  • There is no limit on how much data you can store per Dynamo DB table

Dynamo DB Accelerator

  • Amazon Dynamo DB Accelerator (DAX) is a fully managed, highly available, in-memory cache for Dynamo DB that delivers up to a 10x performance improvement – from milliseconds to microseconds – even at millions of requests per second

  • Now you can focus on building great applications for your customers without worrying about performance at scale

  • You can enable DAX with just a few clicks

AWS Redshift

  • Redshift, is an AWS fully managed, petabyte scale data warehouse service in the cloud

  • A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing

  • It usually contains historical data derived from transaction data, but it can include data from other sources

  • To perform analytics you need a data warehouse not a regular database

  • OLAP (On-line Analytical Processing) is characterized by relatively low volume of transactions

    • Queries are often very complex and involve aggregations (group the data)-
  • RDS (MySQL.. Etc.) is an OLTP database, where there is detailed and current data, and a schema used to store transactional data

Data Security

At Rest

  • Supports encryption of data “at rest” using hardware accelerated AES-256bits (Advanced Encryption Standard)

  • By default, AWS Redshift takes care of encryption key management

  • You can choose to manage your own keys through HSM (Hardware Security Modules), or AWS KMS (Key Management Service)

In-Transit

  • Supports SSL Encryption, in-transit, between client applications and Redshift data warehouse cluster

  • You can’t have direct access to your AWS Redshift cluster nodes, however, you can through the applications themselves

Redshift Cluster

  • No upfront commitment, you can start small and grow as required

    • You can start with a single, 160GB, Redshift data warehouse node
  • For a multi-node deployment (Cluster), you need a leader node and compute node(s)

    • The leader node manages client connections and receives queries

    • The compute nodes store data and perform queries and computations

    • You can have up to 128 compute nodes in a cluster

Back-Up Retention

  • Amazon Redshift automatically patches and backs up (Snapshots) your data warehouse, storing the backups for a user-defined retention period in AWS S3

    • It keeps the backup by default for one day (24hours) but you can configure it from 0 to 35days

    • Automatic backups are stopped if you choose retention period of 0

    • You have access to these automated snapshots during the retention period

  • If you delete the cluster

    • You can choose to have a final snapshot to use later

    • Manual backups are not deleted automatically, if you do not manually delete them, you will be charged standard S3 storage rates

  • AWS Redshift currently supports only one AZ (no Multi-AZ option)

  • You can restore from your backup to a new Redshift cluster in the same or a different AZ

    • This is helpful in case the AZ hosting your cluster fails

Availability and Durability

  • Redshift automatically replicates all your data within your data warehouse cluste

  • Redshift always keeps three copies of your data

    • The original one

    • A replica on compute nodes (within the cluster)

    • A backup copy on S3

Cross Region Replication

  • Redshift can asynchronously replicate your snapshots to S3 in another region for DR

  • Amazon Redshift automatically detect and replace a failure node in your data warehouse cluster

  • The data warehouse cluster will be unavailable for the queries and updates until a replacement node is provided and added to the DB

  • Amazon Redshift makes your replacement node available immediately and loads the most frequently accessed data from S3 first to allow you to resume querying your data as quickly as possible