All Products
Search
Document Center

Well-Architected Framework:Monitoring and Analysis

Last Updated:Sep 25, 2023

Monitoring cloud resources and the security condition of the system, identifying vulnerabilities that may exist in the business system, reacting to alerts about suspicious activities, or tracing security events in daily business activities, are important aspects of building business security confidentiality, integrity, and availability.

Monitoring and Control

Through various monitoring and control methods in the cloud, different levels of threats can be detected, analyzed, and responded. Customized monitoring and detection controls can be implemented based on the specific business requirements. For monitoring and control, the following best practices are recommended:

  • Network Management: Creating isolated and layered networks helps to logically group similar network components and reduces the potential impact scope of unauthorized network access. For Virtual Private Cloud (VPC), network flow can be controlled and cloud service security and reliability can be ensured through the use of ECS security groups, network ACLs, flow logs, etc. Access control policies for VPC can also be implemented through RAM.

  • Access Management: Access management is crucial for businesses. Different roles have different permissions, so assigning minimal and necessary permissions to each user greatly prevents unauthorized operations and reduces potential failures and security risks caused by improper operations.

  • Cloud Config: Compliance in resource configuration needs to be supervised when using large-scale resources. By using Cloud Config services, resource changes under the cloud account can be monitored, configuration change history can be tracked, and real-time compliance auditing can be performed.

  • ActionTrail: ActionTrail can record user logins and resource access operations under the cloud account. This facilitates security analysis, intrusion detection, resource change tracking, and compliance auditing. Behavior events under the account can be downloaded or saved to Log Service (SLS) or Object Storage Service (OSS) for behavior analysis, security analysis, resource change tracking, and behavior compliance auditing.

  • DDoS Protection: Distributed Denial of Service (DDoS) attacks consume the performance or network bandwidth of target servers, causing them to fail to provide services properly. The threats of DDoS attacks can be mitigated through the following measures:

  • Reduce exposure by isolating resources and unrelated businesses.

  • Optimize business architecture.

  • Design systems with elastic scalability and disaster recovery switching capabilities leveraging the features of public clouds.

  • Implement effective business monitoring and emergency response plans.

  • Utilize DDoS protection products.

  • Intrusion Detection: Intrusion detection is mainly used to prevent data leaks or the destruction of business systems. By configuring appropriate detection and alarm mechanisms, threats in cloud servers or cloud products can be identified, such as attacks on assets by malicious IPs or abnormal situations where assets have been compromised.

Log Alerts

Logs provide firsthand information about services and applications. Preserving security event logs ensures the ability to audit, investigate security events, and maintain system security, thereby ensuring the availability and normal operation of resources deployed on the cloud. Therefore, proper log management is the primary requirement for ensuring business security.

Log management includes log collection, secure storage, analysis, and alarm generation. The following guidelines are recommended for log management:

  • Log collection: Logs should be collected from various cloud resources, services, and applications. The collection process should be as non-intrusive as possible.

  • Secure storage: The retention period should be flexible and configurable. The log retention time should be set reasonably based on security requirements, compliance requirements, and the characteristics of different cloud products. Additionally, log data storage should be secure and tamper-proof. Strict control should be maintained over various identities' permissions to the logs, especially writing and deleting permissions.

  • Log querying: Select a reasonable and implementable log querying mechanism based on operational, business, and security requirements.

  • Log analysis: Log standardization and forming a common format for logs from heterogeneous sources are necessary for log analysis. Additionally, comprehensive analysis should be performed to understand the security events of the current system.

  • Alarm generation: Alarms should be real-time, accurate, and reachable, with multiple notification mechanisms as much as possible. In addition, there should be actionable contingency measures for each type of alarm.