Exploring AWS Database Options: RDS, DynamoDB, and Redshift Explained
Amazon RDS
Easy to set up, operate, and scale a relational database in the cloud.
Provides cost-efficient and resizable capacity while automating time-consuming administration tasks such as hardware provisioning, database setup, patching and backups.
Makes to focus on your applications so you can give them the fast performance, high availability, security and compatibility they need.
Primary use case is a transactional database (rather than analytical)
Why Managed RDS vs On Servers
Automated Scaling
Easy to administer
Highly scalable
Available and durable
Fast
Secure
Inexpensive
Automated and Manuals Backups
Different types of database engines supported in AWS RDS
MySQL
Maria DB
PostgreSQL
Oracle
Microsoft SQL Server DB engines
Amazon Aurora
Creation of RDS database
To create an RDS database, search for RDS in the AWS search bar and select the first result.
Navigate to Databases from the left navigation menu, and click on Create Database.
Select "Easy create" and choose the engine option. Configure the remaining settings, then click on "Create database."
It will take a few seconds or minutes to create the RDS database.
Manually connecting to EC2 Server
Create an EC2 instance, then connect to the EC2 server.
Follow this link for connection to database : https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_GettingStarted.CreatingConnecting.MySQL.html#CHAP_GettingStarted.Connecting.MySQL
Use the command sudo dnf install mariadb105
.
Then use the command: mysql -h endpoint -P 3306 -u admin -p
. Here, copy your database endpoint link and paste it in place of endpoint
.
As you can see, the database is connected to the EC2 instance.
Database Read replicas
When the required read I/O capacity is reached but still more I/O capacity is required for heavy/intensive read applications, RDS read replicas can be helpful
A read replica is a replica of the primary database that can be used only for read actions
Multi-AZ Deployment
Multi-AZ for RDS provides high availability, data durability, and fault tolerance for DB instances
You can select the Multi-AZ option during RDS DB instance launch or modify an existing stand alone RDS instance
AWS creates a secondary database in different availability zone in the same region for high availability
Not possible to Insert/Update/Select the data to the Secondary (stand-by) RDS database
OS patching, System upgrades and DB scaling are done on standby DB first and then on primary
Encrypting Amazon RDS Resources
You can encrypt your Amazon RDS DB instances and snapshots at rest by enabling the encryption option for your Amazon RDS DB instances.
You can’t disable encryption on an encrypted DB
You can not enable encryption for an existing, un-encrypted database instance, but there is an alternate way
Create a snapshot of the DB
Copy the snapshot and choose to encrypt it during the copy process
Restore the encrypted copy into a New DB
Alternative to Amazon RDS
If your use case isn’t supported on RDS, you can run databases on Amazon EC2.
Consider the following points when considering a DB on EC2:
You can run any database you like with full control and ultimate flexibility.
You must manage everything like backups, redundancy, patching and scaling.
Good option if you require a database not yet supported by RDS, such as SAP HANA.
Good option if it is not feasible to migrate to AWS-managed database
In-memory (Cache)
In-memory databases are used for applications that require real time access to data. By storing data directly in memory, these databases provide microsecond latency where millisecond latency is not enough.
Used for: Caching, gaming leaderboards, and real-time analytics.
AWS Offerings:
Amazon ElastiCache for Redis
Amazon ElastiCache for Memcached
ElastiCache
Amazon ElastiCache allows you to seamlessly set up, run, and scale popular open-Source compatible in-memory data stores in the cloud
Build data intensive apps or boost the performance of your existing databases by retrieving data from high throughput and low latency in-memory data stores
Elasticache can be used if data stores have areas of data that are frequently accessed but seldom updated
- Additionally, querying a database will always be slower and more expensive than locating a key in a key-value pair cache.
Uses cases
Session Stores
Gaming
Real-Time Analytics
Queuing
Features
Extreme performance by allowing for the retrieval of information from a fast, managed, in- memory system (instead of reading from the DB itself)
Improves response times for user transactions and queries
It offloads the read workload from the main DB instances (less I/O load on the DB)
- It does this by storing the results of frequently accessed pieces of data (or computationally intensive calculations) in-memory
Fully managed
Scalable
Supports two caching engines
memcached (is not a Data store [DB], only a cache)
Redis can be used as a DB (data store)
Amazon Elasticache for Memcached
Is not persistent
Can not be used as a data store
If the node fails, the cached data (in the node) is lost
Ideal front-end for data stores (RDS, DynamoDB…etc)
Does not support Multi-AZ failover, replication, nor does it support Snapshots for backup/restore
- Node failure means data loss
You can, however, place your Memcached nodes in different AZs to minimize the impact of an AZ failure and to contain the data loss in such an incident
Use cases
Cache contents of a DB
Cache data from dynamically generated webpages
Amazon Elasticache for Redis
Is persistent, using the snapshot feature
At any time, you can restore your data by creating a new Redis cluster and populating it with data from a backup
Supports Redis master/slave replication
Supports snapshots (automatic and manual) to S3 (managed by AWS)
The back up can be used to restore a cluster or to seed a new cluster
The back up includes cluster metadata and all data in the cluster
Amazon DynamoDB
It is a key-value and document database that delivers single digit millisecond performance at any scale
It's a fully managed, multi-region, multi-master database with built-in security, backup and restore, and in-memory caching for internet-scale applications
Can handle more than 10 trillion requests per day and support peaks of more than 20 million requests per second
Many of the world's fastest growing businesses such as Lyft, Airbnb, and Redfin as well as enterprises such as Samsung, Toyota, and Capital One depend on the scale and performance of DynamoDB to support their mission-critical workloads
Benefits
Performance at scale: DynamoDB supports some of the world’s largest scale applications by providing consistent, single-digit millisecond response times at any scale
Serverless: there are no servers to provision, patch, or manage and no software to install, maintain, or operate
Use cases
Serverless Web Applications
Microservices Data Store
Mobile Back ends
Gaming
IOT
Tables
Dynamo DB tables are schema less - Which means that neither the attributes nor their data types need to be defined before hand
Each item can have its own distinct attributes
Dynamo DB does not support
Complex relations DB querying or joins
Does not support complex transactions
Durability and performance
Dynamo DB automatically keep data across three facilities(Datacenters) in a region for High availability and data durability
It also partitions your DB over sufficient number of servers according to reads/write capacity
Performs automatic failover in case of any failure
Dynamo DB runs exclusively on SSD volumes which provides
Low latency
Predictable performance
High I/O’s
Dynamo DB basic Components
Tables
Like all other DBs, Dynamo DB stores data in tables
A table is a collection of data items
Each table can have an infinite number of data items
Items
Each table contains multiple data items
An data item consists of a primary or composite key and a flexible number of attributes
There is no limit to the number of items you can store in a table
Attributes
Each item is composed of one or more attributes
An attribute consists of the attribute name and a value or a set of values
An attribute is a fundamental data element
Attributes in Dynamo DB are similar into fields or columns in other database systems
Read Capacity Units
One read capacity unit represents one strongly consistent read per second, or two eventually consistent reads per second for an item up to 4KB in size
If you need to read an item that is larger than 4 KB, Dynamo DB will need to consume additional read capacity units
The total number of read capacity units required depends on the item size, and whether you want an eventually consistent or strongly consistent read.
Write Capacity Units
One write capacity unit represents one write per second for an item up to 1 KB in size
If you need to write an item that is larger than 1 KB
Dynamo DB will need to consume additional write capacity units
The total number of write capacity units required depends on the item size
Scalability
It provides for a push button scaling on AWS where you can increase the read/write throughput and AWS will go ahead and scale it for you (up or down) without downtime or performance degradations
You can scale the provisioned capacity of your Dynamo DB table any time you want
There is no limit to the number of items(data) you can store in a Dynamo DB table
There is no limit on how much data you can store per Dynamo DB table
Dynamo DB Accelerator
Amazon Dynamo DB Accelerator (DAX) is a fully managed, highly available, in-memory cache for Dynamo DB that delivers up to a 10x performance improvement – from milliseconds to microseconds – even at millions of requests per second
Now you can focus on building great applications for your customers without worrying about performance at scale
You can enable DAX with just a few clicks
AWS Redshift
Redshift, is an AWS fully managed, petabyte scale data warehouse service in the cloud
A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing
It usually contains historical data derived from transaction data, but it can include data from other sources
To perform analytics you need a data warehouse not a regular database
OLAP (On-line Analytical Processing) is characterized by relatively low volume of transactions
- Queries are often very complex and involve aggregations (group the data)-
RDS (MySQL.. Etc.) is an OLTP database, where there is detailed and current data, and a schema used to store transactional data
Data Security
At Rest
Supports encryption of data “at rest” using hardware accelerated AES-256bits (Advanced Encryption Standard)
By default, AWS Redshift takes care of encryption key management
You can choose to manage your own keys through HSM (Hardware Security Modules), or AWS KMS (Key Management Service)
In-Transit
Supports SSL Encryption, in-transit, between client applications and Redshift data warehouse cluster
You can’t have direct access to your AWS Redshift cluster nodes, however, you can through the applications themselves
Redshift Cluster
No upfront commitment, you can start small and grow as required
- You can start with a single, 160GB, Redshift data warehouse node
For a multi-node deployment (Cluster), you need a leader node and compute node(s)
The leader node manages client connections and receives queries
The compute nodes store data and perform queries and computations
You can have up to 128 compute nodes in a cluster
Back-Up Retention
Amazon Redshift automatically patches and backs up (Snapshots) your data warehouse, storing the backups for a user-defined retention period in AWS S3
It keeps the backup by default for one day (24hours) but you can configure it from 0 to 35days
Automatic backups are stopped if you choose retention period of 0
You have access to these automated snapshots during the retention period
If you delete the cluster
You can choose to have a final snapshot to use later
Manual backups are not deleted automatically, if you do not manually delete them, you will be charged standard S3 storage rates
AWS Redshift currently supports only one AZ (no Multi-AZ option)
You can restore from your backup to a new Redshift cluster in the same or a different AZ
- This is helpful in case the AZ hosting your cluster fails
Availability and Durability
Redshift automatically replicates all your data within your data warehouse cluste
Redshift always keeps three copies of your data
The original one
A replica on compute nodes (within the cluster)
A backup copy on S3
Cross Region Replication
Redshift can asynchronously replicate your snapshots to S3 in another region for DR
Amazon Redshift automatically detect and replace a failure node in your data warehouse cluster
The data warehouse cluster will be unavailable for the queries and updates until a replacement node is provided and added to the DB
Amazon Redshift makes your replacement node available immediately and loads the most frequently accessed data from S3 first to allow you to resume querying your data as quickly as possible