On the MSK Dashboard, select Clusters in the left-hand menu to view a list of Kafka clusters.
Click on a cluster name to view its details, including Configuration, Monitoring, Networking, Security, Broker nodes, and Cluster settings.
Cluster Details:
The Cluster details page provides insights into cluster settings such as Broker version, Cluster type (Provisioned or Serverless), Availability Zones, and Storage settings.
3. Exploring the AWS Well-Architected Framework Pillars
Under the Monitoring tab in the cluster details, review key metrics like Broker CPU utilization, Memory usage, Active connections, and Bytes in/out. Regularly monitoring these metrics helps ensure the Kafka cluster operates smoothly.
Broker Configuration:
In the Configuration tab, examine the Broker configuration settings. Verify that the broker's log retention policies and partition settings are appropriately configured to align with your operational needs.
Cluster Version:
On the Cluster details page, check the Kafka version. Keeping your Kafka version up to date helps ensure access to the latest features and performance improvements.
Logging:
Under Configuration, review the Logging settings to ensure that logs are being sent to CloudWatch Logs, S3, or Kinesis Data Firehose. Proper logging aids in troubleshooting and maintaining operational health.
In the Cluster details section, verify if Encryption is enabled:
Encryption at rest: Check if the data stored on the brokers is encrypted using AWS KMS.
Encryption in transit: Ensure that TLS is enabled for communication between clients and brokers and between brokers to secure data while in transit.
Access Control:
Review the Access control method to determine if the cluster uses IAM, SASL/SCRAM, or TLS for client authentication. Ensure that an appropriate access control method is in place to secure Kafka clients' communication with the cluster.
Network Settings:
Under the Networking tab, review the VPC, Subnets, and Security groups associated with the cluster. Ensure the cluster is deployed in a private subnet with security groups that restrict access to only necessary IP addresses and ports.
Audit and Logging:
In the Configuration tab, check the Audit logging settings. Ensure that audit logs are enabled to track client connections and API calls, which can help detect and respond to unauthorized access attempts.
In the Cluster details section, verify the Availability Zones used for the cluster. Ensure that the cluster is set up across multiple availability zones to enhance fault tolerance and availability.
Broker Scaling:
Check the Broker node count under Cluster details to verify the number of brokers in the cluster. Having multiple brokers improves the cluster's fault tolerance and data replication, enhancing overall reliability.
Automatic Recovery:
Review the Monitoring section to check if Automated monitoring is in place. MSK provides built-in monitoring and automated recovery mechanisms to restart failed broker nodes, ensuring consistent data streaming.
Data Replication:
In the Broker configuration settings, ensure that Replication factors for topics are appropriately set. A higher replication factor provides better fault tolerance, ensuring that data remains available even if some brokers go down.
In the Cluster details section, check the Cluster type (Provisioned or Serverless). Using MSK Serverless can optimize costs for workloads with unpredictable traffic patterns, while Provisioned clusters allow for more control over resource allocation.
Broker Instance Types:
Review the Broker instance type in the Cluster details. Ensure the selected instance type matches your workload's requirements to avoid over-provisioning resources, which can lead to unnecessary costs.
Storage Settings:
Under Configuration, examine the Storage settings. Set appropriate storage limits and consider enabling Tiered Storage for older data, which can reduce costs by moving less frequently accessed data to lower-cost storage tiers.
Scaling:
For Provisioned clusters, monitor metrics like CPU usage and Network throughput in the Monitoring tab. Use these metrics to adjust the number of brokers or their instance types to optimize costs.
In the Monitoring tab, review performance-related metrics like Throughput (bytes in/out), Request latency, Partition count, and Replication lag. Regularly monitoring these metrics helps optimize performance by identifying bottlenecks.
Broker Configurations:
Under the Configuration tab, inspect broker configurations related to Topic settings, Partitioning, and Replication. Properly configured broker parameters ensure efficient data streaming and processing.
Networking and Subnet Placement:
In the Networking section, verify the use of private subnets and VPC endpoints for inter-broker communication. Efficient network configurations reduce latency and improve data transfer performance.
Tiered Storage:
If applicable, check for the use of Tiered Storage in the cluster’s Storage settings. Using tiered storage helps manage performance by separating frequently accessed data from older, less frequently accessed data.
Use CloudWatch (accessible via the Monitoring tab) to set up alarms based on critical metrics such as CPU Utilization, Network throughput, and Disk usage. This proactive monitoring helps maintain optimal cluster performance.
Log in to the AWS Management Console of securitytooling account.
AWS Config and Security Hub:
If AWS Config and Security Hub are enabled, review compliance findings related to MSK to ensure that your clusters adhere to security best practices, such as data encryption and restricted network acces