All Products
Search
Document Center

Well-Architected Framework:Incident response

Last Updated:Dec 24, 2025

Incident response is a structured process for slowing or stopping an attack, activated during or after a security event.

It involves three phases:

  1. Before an incident: Create a classification system for security events, develop response plans, and write playbooks. This is often the most difficult part of incident response.

  2. During an incident: Use monitoring to detect security events in real time. Activate the corresponding response plan to quickly block or mitigate the risk.

  3. After an incident: Conduct a post-incident review. Use its findings to optimize and update your incident response procedures, plans, and playbooks.

How should I classify cloud security incidents and assign severity levels?

You should classify incidents and assign severity levels based on the type of security event. This approach enables a quick and organized response using pre-defined plans and playbooks.

A standard practice is to categorize events by attack type, based on common cloud security threats:

Incident category

Example

Description

Severity level

Rationale

Application security events

Web intrusion

A server is targeted by a SQL injection attack.

High

Application security events can be detected or blocked by security devices, such as a Web Application Firewall (WAF). WAF alerts include a severity level for the event. The recommended level depends on the attack category.

Network security events

DDoS attack

A DDoS or CC attack targets a server, making the service unavailable.

High

A DDoS attack is typically a high-severity event due to its business impact. These attacks directly affect service stability and reliability.

System security events

Ransomware

The system is infected with ransomware, and core data is encrypted.

High

System events are often reported by Alibaba Cloud Security Center, which classifies intrusions based on threat intelligence. The classification should be based on the guidelines in Security Center.

Stability and reliability events

Cloud stability event

Network or application is down.

High

Stability incidents are usually high-risk events.

Data security events

Data breach

External intelligence monitoring or public sentiment indicates that internal, confidential data has leaked.

High

The severity of a data breach depends on the content and authenticity of the leaked data, as well as the business and public relations risks.

Vulnerability events

log4j vulnerability

High-impact vulnerability

High

The severity of a vulnerability event depends on its impact. For example, Security Center publishes alerts for critical vulnerabilities. If you discover such a vulnerability, treat it as a high-priority incident.

What are the essential phases of an incident response plan?

An effective incident response plan outlines the specific procedures for handling a security event. At a minimum, the plan should include the following phases:

  1. Monitor and detect security events.

  2. Confirm the authenticity of the vulnerability or event.

  3. Identify the scope of impact, responsible personnel, and affected services.

  4. Establish a response strategy, including mitigation and containment.

  5. Analyze the event, perform source tracing, and document all information.

  6. Conduct a post-incident review.

How can I automate incident response for common security events?

Automated incident response playbooks help your security and operations teams take immediate action when an event occurs. You can configure these playbooks to trigger based on event classification and integrate them with your Security Information and Event Management (SIEM) or other alerting systems.

Common use cases for automated response playbooks include:

  1. DDoS attacks: When a DDoS attack occurs, you can trigger a playbook to automatically route traffic through Anti-DDoS Pro and Anti-DDoS Premium for traffic scrubbing.

  2. Vulnerabilities: Depending on the vulnerability type and whether a system restart is required, you can configure a playbook to automatically patch a group of servers. Schedule the execution within a defined maintenance window.

  3. Network attacks: Based on the attack's severity, you can automatically block the attacker's IP address. Configure a playbook to extract the source IP address from the attack alert and add it to the blocklists of your firewall, WAF, and Server Load Balancer.

What is the best way to validate an incident response plan?

The best way to validate your incident response plan and playbooks is by conducting a simulated attack against your core systems using a Red Team/Blue Team exercise. This process validates the effectiveness of your overall response capabilities.

  • Red Team: Also known as the attack team, the Red Team simulates the role of an attacker. Using an adversarial perspective and frameworks like ATT&CK, the Red Team attempts to breach target systems. This process validates both your security defenses and your team's capabilities for detection, monitoring, and response.

  • Blue Team: Also known as the defense team, the Blue Team consists of members from your Security Operations Center (SOC). This team defines, classifies, detects, monitors, analyzes, and responds to security events. During a Red Team/Blue Team exercise, the Blue Team uses predefined monitoring rules, analysis techniques, and incident response procedures to respond to alerts from the simulated attack. This provides realistic, hands-on training for the team.

Use these attack simulations to validate your defense and response capabilities, then optimize and adjust your strategies accordingly.