How to Use Trusted Advisor, CloudTrail, and CloudWatch for Cloud Optimization

Trusted Advisor

  • AWS Trusted Advisor acts like your customized cloud expert, and it helps you provision your resources by following best practices.

  • Trusted Advisor inspects your AWS environment and finds opportunities to save money, improve system performance and reliability, or help close security gaps.

Cost Optimization checks

  • Low Utilization Amazon EC2 Instances : Checks the Amazon Elastic Compute Cloud (Amazon EC2) instances that were running at any time during the last 14 days and alerts you if the daily CPU utilization was 10% or less and network I/O was 5 MB or less on 4 or more days

  • Underutilized Amazon EBS Volumes: Checks Amazon Elastic Block Store (Amazon EBS) volume configurations and warns when volumes appear to be underused

  • Unassociated Elastic IP Addresses: Checks for Elastic IP addresses (EIPs) that are not associated with a running Amazon Elastic Compute Cloud (Amazon EC2) instance

  • Amazon RDS Idle DB Instances: Checks the configuration of your Amazon Relational Database Service (Amazon RDS) for any DB instances that appear to be idle. If a DB instance has not had a connection for a prolonged period of time, you can delete the instance to reduce costs

Performance Checks

  • High Utilization Amazon EC2 Instances: Checks the Amazon Elastic Compute Cloud (Amazon EC2) instances that were running at any time during the last 14 days and alerts you if the daily CPU utilization was more than 90% on 4 or more days

  • Large Number of Rules in an EC2 Security Group: Checks each Amazon Elastic Compute Cloud (EC2) security group for an excessive number of rules. If a security group has a large number of rules, performance can be degraded

  • Over utilized Amazon EBS Magnetic Volumes: Checks for Amazon Elastic Block Store (EBS) Magnetic volumes that are potentially over utilized and might benefit from a more efficient configuration

Security Checks

  • Security Groups - Specific Ports Unrestricted: Checks security groups for rules that allow unrestricted access (0.0.0.0/0) to specific ports

  • Amazon EBS Public Snapshots: Checks the permission settings for your Amazon Elastic Block Store (Amazon EBS) volume snapshots and alerts you if any snapshots are marked as public

  • AWS CloudTrail Logging: Checks for your use of AWS CloudTrail. CloudTrail provides increased visibility into activity in your AWS account by recording information about AWS API calls made on the account

Fault Tolerance Checks

  • Amazon EBS Snapshots: Checks the age of the snapshots for your Amazon Elastic Block Store (Amazon EBS) volumes (available or in-use)

  • Amazon EC2 Availability Zone Balance: Checks the distribution of Amazon Elastic Compute Cloud (Amazon EC2) instances across Availability Zones in a region

  • Amazon RDS Backups: Checks for automated backups of Amazon RDS DB instances

  • Amazon RDS Multi-AZ: Checks for DB instances that are deployed in a single Availability Zone

  • Amazon S3 Bucket Logging: Checks the logging configuration of Amazon Simple Storage Service (Amazon S3) buckets

Service Limit Checks

  • EBS Active Snapshots: Checks for usage that is more than 80% of the EBS Active Snapshots Limit

  • EC2 Elastic IP Addresses: Checks for usage that is more than 80% of the EC2 Elastic IP Addresses Limit

  • RDS DB Instances: Checks for usage that is more than 80% of the RDS DB Instances Limit

  • VPC: Checks for usage that is more than 80% of the VPC Limit

  • VPC Internet Gateways: Checks for usage that is more than 80% of the VPC Internet Gateways Limit

Amazon Cloud Trail

  • AWS CloudTrail is an AWS service that helps you enable governance, compliance, and operational and risk auditing of your AWS account

  • Actions taken by a user, role, or an AWS service are recorded as events in CloudTrail

  • CloudTrail is enabled on your AWS account when you create it

  • You can identify

    • who or what took which action

    • what resources were acted upon

    • when the event occurred

    • other details to help you analyze and respond to activity in your AWS account

    • Can create 5 Trails per region (cannot be increased

Cloud Trail Events

  • An event in CloudTrail is the record of an activity in an AWS account

  • This activity can be an action taken by a user, role, or service that is monitorable by CloudTrail

  • There are two types of events that can be logged in CloudTrail

    • Management Events

    • Data Events

  • By default, trails log management events, but not data events

  • CloudTrail does not log all AWS services. Some AWS services do not enable logging of all APIs and events

Management Events

  • Management events provide information about management operations that are performed on resources in your AWS account. These are also known as control plane operations

  • Example

    • Registering devices

    • Configuring rules for routing data

Data Events

  • Data events provide information about the resource operations performed on or in a resource. These are also known as data plane operations.

  • Data events are often high-volume activities

  • Example

    • Amazon S3 object-level API activity Data events are disabled by default when you create a trail. To record CloudTrail data events, you must explicitly add to a trail the supported resources or resource types for which you want to collect activity

Insights Events

  • CloudTrail Insights events capture unusual activity in your AWS account.

  • If you have Insights events enabled, and CloudTrail detects unusual activity, Insights events are logged to a different folder or prefix in the destination S3 bucket for your trail

CloudTrail Event History

  • CloudTrail event history provides a viewable, searchable, and downloadable record of the past 90 days of CloudTrail events.

  • You can use this history to gain visibility into actions taken in your AWS account in the AWS Management Console, AWS SDKs, command line tools, and other AWS services

Encryption

  • By default, AWS CloudTrail encrypts all log files delivered to your specified Amazon S3 bucket using Amazon S3 server-side encryption (SSE)

  • Optionally, can add a layer of security to your CloudTrail log files by encrypting the log files with your AWS Key Management Service (AWS KMS) key

Creation of custom Cloud trail

To create a CloudTrail, search for "CloudTrail" and select the first option from the search results.

Go to the trails section from the CloudTrail left navigation menu, and click on "Create trail."

Choose the trail attributes according to your needs and click "Next."

Choose the necessary log events and click "Next."

Review the chosen attributes and click on "Create trail."

CloudWatch

  • Amazon CloudWatch monitors your Amazon Web Services (AWS) resources and the applications you run on AWS in real time

  • The CloudWatch home page automatically displays metrics about every AWS service you use

  • You can additionally create custom dashboards to display metrics about your custom applications by installing the CloudWatch agent on the instance/server

  • You can create alarms which watch metrics and send notifications or automatically make changes to the resources you are monitoring when a threshold is breached

    • For example, you can monitor the CPU usage and disk reads and writes of your Amazon EC2 instances and then use this data to determine whether you should launch additional instances to handle increased load. You can also use this data to stop under-used instances to save money
  • With CloudWatch, you gain system-wide visibility into resource utilization, application performance, and operational health

AWS Services which are integrated with Cloud watch

  • Amazon Simple Notification Service (Amazon SNS)

  • Amazon EC2

  • Auto Scaling

  • AWS CloudTrail

  • AWS Identity and Access Management (IAM)

How Amazon Cloudwatch Works

  • Amazon Cloudwatch is basically a metrics repository

  • An AWS service, such as Amazon EC2 - puts metrics into the repository, and you retrieve statistics based on those metrics

  • If you put your own custom metrics into the repository, you can retrieve statistics on these metrics as well

  • You can use metrics to calculate statistics and then present the data graphically in the CloudWatch console

  • You can configure alarm actions to stop, start, or terminate an Amazon EC2 instance when certain criteria are met

    • In addition, you can create alarms that initiate Amazon EC2 Auto Scaling and Amazon Simple Notification Service (Amazon SNS) actions on your behalf

Cloudwatch Concepts

Metric

  • Metrics are the fundamental concept in CloudWatch

  • A metric represents a time-ordered set of data points that are published to CloudWatch

  • Think of a metric as a variable to monitor, and the data points as representing the values of that variable over time

  • For example, the CPU usage of a particular EC2 instance is one metric provided by Amazon EC2. The data points themselves can come from any application or business activity from which you collect data

  • By default, several services provide free metrics for resources (such as Amazon EC2 instances, Amazon EBS volumes, and Amazon RDS DB instances). You can also enable detailed monitoring for some resources

Alarm

  • You can create a CloudWatch alarm that watches a single CloudWatch metric

  • The alarm performs one or more actions based on the value of the metric or expression relative to a threshold over a number of time periods.

  • The action can be an Amazon EC2 action, an Amazon EC2 Auto Scaling action, or a notification sent to an Amazon SNS topic.

  • You can also add alarms to CloudWatch dashboards and monitor them visually.

  • When an alarm is on a dashboard, it turns red when it is in the ALARM state, making it easier for you to monitor its status proactively.

  • After an alarm invokes an action due to a change in state, its subsequent behavior depends on the type of action that you have associated with the alarm.

  • An alarm has the following possible states:

    • OK - The metric or expression is within the defined threshold.

    • ALARM - The metric or expression is outside of the defined threshold.

    • INSUFFICIENT_DATA - The alarm has just started, the metric is not available, or not enough data is available for the metric to determine the alarm state.