1. Navigate to CloudWatch in the Management Console
Log in to the AWS Management Console of cloudexploration prod account us-east-1 region.
From the Services menu, select CloudWatch under the Management & Governance section.
This will take you to the CloudWatch Dashboard, where you can access different CloudWatch features such as Metrics, Logs, Alarms, Events, and Dashboards.
On the CloudWatch Dashboard, you can view an overview of your metrics, alarms, and logs. You can create custom dashboards to monitor specific aspects of your AWS environment.
Metrics:
Select Metrics from the left-hand menu to explore a list of services and their associated metrics. You can select any service (e.g., EC2, RDS, Lambda) to view specific performance and usage metrics.
Alarms:
Go to Alarms to review the alarms that have been set up. Alarms can be configured to monitor metrics and send notifications when thresholds are breached.
Logs:
Click on Logs to access the Log Groups. Each log group contains the log streams for various services (e.g., EC2, Lambda). Explore the log groups to review logs generated by different AWS services.
3. Exploring the AWS Well-Architected Framework Pillars
In the Metrics section, explore metrics for key services, such as CPU utilization for EC2, memory usage for Lambda, and request latency for API Gateway. Regular monitoring of these metrics helps ensure your services operate efficiently.
Alarms:
In the Alarms section, review the list of configured alarms to see what aspects of your infrastructure are being monitored. Verify that alarms are set up for critical metrics, such as high CPU usage, low disk space, or increased error rates.
Dashboards:
Select Dashboards to view custom dashboards created for monitoring specific applications or resources. Dashboards provide a centralized view of key metrics, aiding in operational oversight.
Events and Automation:
Check the Events section (under Rules) to see if rules are set up for automated responses to changes in the environment. For example, triggering a Lambda function when an instance enters a specific state can automate operational tasks.
In the Logs section, explore Log Groups for services like VPC Flow Logs, CloudTrail, and AWS WAF. These logs provide insights into network traffic, API activity, and web application firewall events, helping detect security incidents.
Alarms for Security Events:
In the Alarms section, look for alarms related to security, such as failed login attempts, unauthorized API calls, or changes to security group settings. Monitoring these events helps enhance the security of your environment.
Compliance Checks:
If AWS Config rules are enabled in your account, you can use CloudWatch alarms to monitor changes in compliance status, such as when resources become non-compliant with security policies.
Use the Metrics section to check the health of key AWS services. For example, monitor CPUUtilization and StatusCheckFailed metrics for EC2 instances to ensure they are running reliably.
Alarms for Availability:
Review Alarms to see if there are any configured to monitor the availability of critical services. For example, alarms for high latency or request failures in API Gateway can indicate issues affecting reliability.
Log Analysis:
In the Logs section, access log groups to troubleshoot issues by reviewing system logs (e.g., application logs, database logs). Analyzing these logs helps identify the root cause of failures, improving overall reliability.
Explore Metrics for services like RDS (e.g., FreeStorageSpace) and EC2 (e.g., CPUUtilization). These metrics help identify underused resources, such as EC2 instances running at low CPU utilization, which can be scaled down to optimize costs.
Alarms for Resource Usage:
In the Alarms section, review alarms monitoring cost-related metrics, such as Billing metrics or S3 bucket storage usage. Setting alarms for resource utilization ensures that you are alerted when resource consumption exceeds defined thresholds.
Log Retention Policies:
In the Logs section, explore each Log Group and check the Retention settings. Ensure that logs have an appropriate retention period to avoid unnecessary storage costs. Shortening the retention period for less critical logs can help save costs.
In the Metrics section, explore performance-related metrics such as Latency, Throughput, and RequestCount for services like API Gateway, Lambda, and RDS. Monitoring these metrics allows you to identify and address performance bottlenecks.
Dashboards for Real-Time Monitoring:
Use Dashboards to display real-time performance metrics in a consolidated view. This enables quick identification of performance issues across various services.
Alarms for Performance Thresholds:
In the Alarms section, review any alarms set for performance-related metrics, such as high latency or increased error rates. By setting these alarms, you can proactively address performance issues before they impact the user experience.
CloudWatch integrates with other AWS services like Auto Scaling, Lambda, and SNS. In the Events and Alarms sections, explore these integrations to see if they are set up to automate responses based on operational metrics.
Log in to the AWS Management Console of securitytooling account.
AWS Security Hub:
If AWS Security Hub is available in your environment, review the findings related to CloudWatch configurations. Security Hub can provide insights into whether your monitoring and logging practices align with security best practices.