System events record and provide notifications about cloud resources, such as operations and maintenance (O&M) task executions, resource exceptions, and resource status changes. You can use system events to obtain information about risks and anomalies for your Elastic Compute Service (ECS) resources. For example, a system event is generated when an instance must be migrated because of an underlying upgrade, or is restarted for system maintenance. Respond to and handle system events promptly to prevent your business from being affected by reduced ECS resource availability or performance. This topic summarizes the system events that ECS supports, including scheduled O&M events, unexpected O&M events, instance billing events, and instance status change events. It also provides suggestions on how to handle each system event.
Formats of ECS event codes and CloudMonitor event names
ECS system events are synchronized to CloudMonitor. This lets you set up an automated O&M mechanism based on system events. ECS event codes and CloudMonitor event names follow specific naming conventions. The formats are as follows:
ECS event codes: Include information about the event cause and the impact on the resource. The format is
<Event cause>.<Impact on resource>.CloudMonitor event names: Include information about the resource type, event cause, impact on the resource, and event status. The format is
<Resource type>:<Event cause>.<Impact on resource>:<Event status>.
Not all ECS event codes and CloudMonitor event names include all this information. For example, the CloudMonitor event name Disk:ErrorDetected:Executing indicates that a disk is damaged and does not need to include information about the subsequent impact on the resource.
The following table describes some examples of ECS event codes and CloudMonitor event names.
If the sample ECS event code is Undefined, the system event is not displayed in the ECS console and cannot be handled in the ECS console or by calling an OpenAPI operation.
Category | Sample ECS event code | Sample Cloud Monitor event name | Description |
Scheduled O&M events | SystemMaintenance.Reboot | Instance:SystemMaintenance.Reboot:Inquiring |
|
Unexpected O&M events | ErrorDetected | Disk:ErrorDetected:Executing |
|
Lifecycle change events | Snapshot:CreateSnapshotCompleted | Snapshot:CreateSnapshotCompleted |
|
Scheduled O&M events
Restarting an instance from within its operating system does not apply the maintenance action for the event. Therefore, the restart operations in this topic refer to restarts performed in the ECS console or by calling an OpenAPI operation. For more information, see Restart an instance or RebootInstance.
Event code | Event name | Event severity level | CloudMonitor event name | Event description and impact | Recommendations for users |
SystemMaintenance.Reboot | Instance restart because of system maintenance | Critical |
| Alibaba Cloud detects a potential risk of a software or hardware failure on the underlying host of an ECS instance. This risk can cause the ECS instance to restart. The risk has not yet become an actual failure. This system event is sent 24 to 48 hours before the scheduled system maintenance. Note Failure risks include the following:
| Select a response method as needed:
Note
|
SystemMaintenance.Stop | Instance stop because of system maintenance | Critical |
| This system event is sent 24 to 48 hours before the scheduled system maintenance when Alibaba Cloud detects a potential risk of a software or hardware failure on the underlying host of an ECS instance. This risk can cause the instance to be shut down and stopped. The risk has not yet become an actual failure. | Select a response method as needed:
Note You can modify the maintenance properties of the instance to specify the default action to take when an O&M event occurs on the instance. For more information, see Modify instance maintenance properties. |
SystemMaintenance.Redeploy | Instance redeployment because of system maintenance | Critical |
| This system event is sent 24 to 48 hours before the scheduled system maintenance when Alibaba Cloud detects a potential risk of a software or hardware failure on the underlying host of an ECS instance. This risk can cause the instance to be redeployed. The risk has not yet become an actual failure. Important For an instance that uses local SSDs or local HDDs, the data disks are re-initialized and the data on the local disks is cleared. | Make preparations, such as modifying the /etc/fstab configuration file and backing up data. Then, select a response method as needed:
Note
|
SystemMaintenance.IsolateErrorDisk | Damaged disk isolation because of system maintenance | Critical |
| This system event is sent immediately when Alibaba Cloud detects software or hardware damage on a local disk of an ECS instance. Important The procedure for handling a damaged local disk varies based on the instance type. For some instance types, the instance must be restarted to isolate the damaged disk. For other instance types, the damaged disk can be isolated and repaired online. | Make preparations, such as modifying the /etc/fstab configuration file and backing up data. Then, select an appropriate time to authorize the damaged disk to be isolated. The disk is isolated online without restarting the instance. Note For more information about the O&M process, see Scenario ③ for instances with local disks. |
SystemMaintenance.ReInitErrorDisk | Damaged disk re-initialization because of system maintenance | Critical |
| This system event is sent immediately after Alibaba Cloud detects software or hardware damage on a local disk of an ECS instance and replaces the damaged local disk on the host. This typically occurs within five business days after you authorize the disk isolation. Important The procedure for handling a damaged local disk varies based on the instance type. For some instance types, the instance must be restarted to isolate the damaged disk. For other instance types, the damaged disk can be isolated and repaired online. | Select an appropriate time to authorize the local disk to be restored. The disk is restored online without restarting the instance. Note For more information about the O&M process, see Scenario ③ for instances with local disks. |
SystemMaintenance.RebootAndIsolateErrorDisk | Instance restart and damaged disk isolation because of system maintenance | Critical |
| This system event is sent immediately when Alibaba Cloud detects software or hardware damage on a local disk of an ECS instance and fails to isolate the disk online. Important The procedure for handling a damaged local disk varies based on the instance type. For some instance types, the instance must be restarted to isolate the damaged disk. For other instance types, the damaged disk can be isolated and repaired online. | Select an appropriate time to authorize the damaged disk to be isolated, and restart the instance yourself. The disk is isolated offline, which requires an instance restart. Note For more information about the O&M process, see Scenario ③ for instances with local disks. |
SystemMaintenance.RebootAndReInitErrorDisk | Instance restart and damaged disk re-initialization because of system maintenance | Critical |
| This system event is sent immediately when Alibaba Cloud detects software or hardware damage on a local disk of an ECS instance and fails to restore the local disk online. Important The procedure for handling a damaged local disk varies based on the instance type. For some instance types, the instance must be restarted to isolate the damaged disk. For other instance types, the damaged disk can be isolated and repaired online. | Select an appropriate time to authorize the local disk to be restored, and restart the instance yourself. The disk is restored offline, which requires an instance restart. Note For more information about the O&M process, see Scenario ③ for instances with local disks. |
SystemMaintenance.StopAndRepair | In-place repair event for an instance with local disks | Critical |
| This system event is sent 48 to 168 hours before the scheduled system maintenance when Alibaba Cloud detects a risk of hardware failure on the underlying host of an ECS instance. | Select an appropriate time to authorize the repair or redeployment of the instance with local disks. Note For more information about the O&M process, see O&M scenarios and system events for instances with local disks. |
SystemMaintenance.CleanReleasedDisks | Cleanup event after EBS hot-plug failure | Warning |
| This system event is sent when Alibaba Cloud detects configuration information for one or more cloud disks that were released because of overdue payments in the operating system of an ECS instance. | Select an appropriate time to authorize Alibaba Cloud to clear the configuration information of the released cloud disks. Important Alibaba Cloud shuts down the instance at the time you specify, cleans up the disks, and then starts the instance again. |
Unexpected O&M events
Event code | Event name | Event severity level | Cloud Monitor event name | Event description and impact | Handling suggestion |
SystemFailure.Reboot | Instance restart due to system error | Critical |
| This system event is sent immediately when Alibaba Cloud detects that an ECS instance is restarted due to an unexpected software or hardware failure on the underlying host, such as CPU or memory hardware damage. | Wait for the instance to automatically restart, and then check whether the instance and its applications are running correctly. During the restart, Alibaba Cloud migrates the instance to a healthy host. Note You can modify the maintenance properties of the instance to specify the default action to take when an O&M event occurs on the instance. For more information, see Modify instance maintenance properties. |
InstanceFailure.Reboot | Instance restart required due to an operating system error | Critical |
| This system event is sent immediately when Alibaba Cloud detects that an ECS instance is down due to an internal operating system issue, such as an out-of-memory (OOM) error, blue screen, freeze, continuous printing of serial port logs, or kernel panic. | Wait for the instance to automatically restart, and then check whether the instance and its applications are running correctly. You can enable the Kdump service for the operating system to identify the cause of the crash and prevent similar issues from recurring. For more information, see Enable the Kdump service for a Linux instance or Enable the Kernel Memory Dump feature for a Windows instance. |
SystemFailure.Stop | Instance stop due to system error | Critical |
| This system event is sent immediately when Alibaba Cloud detects that an ECS instance is shut down due to a software or hardware failure on the underlying host, such as CPU or memory hardware damage. | Wait for the instance to be automatically stopped, and then start the instance. When you start the instance, Alibaba Cloud migrates the instance to a healthy host. Note You can modify the maintenance properties of the instance to specify the default action to take when an O&M event occurs on the instance. For more information, see Modify instance maintenance properties. |
SystemFailure.Redeploy | Instance redeployment due to system error | Critical |
| This system event is sent immediately when Alibaba Cloud detects that an instance with local disks must be redeployed due to a software or hardware failure on the underlying host. Note This type of event is supported only for instances that depend on host hardware, such as instances that have local disks attached or support SGX-based confidential computing. | Make preparations, such as modifying the /etc/fstab configuration file and backing up data. Then, select a response method as needed:
Note You can modify the maintenance properties of the instance to specify the default action to take when an O&M event occurs on the instance. For more information, see Modify instance maintenance properties. |
SystemFailure.Delete | Automatic bill cancellation due to instance creation failure | Critical |
| This system event is sent immediately when Alibaba Cloud detects that an order to create an ECS instance is successful but the instance fails to be created. | Wait for the system to automatically release the instance. The instance is typically released within five minutes after it fails to be created. Note If you have paid for the order, you receive a refund after the instance is released. To increase the success rate of instance creation:
|
ErrorDetected | Alert for local disk damage | Critical |
| This system event is sent immediately when Alibaba Cloud detects an unexpected software or hardware damage on a local disk of an ECS instance, which prevents the disk from being read from or written to. | Make preparations, such as modifying the /etc/fstab configuration file and backing up data. Then, select an appropriate time to isolate the damaged disk and restore the local disk. The supported operations vary based on the instance type. The following list describes the details:
Note For more information about the O&M process, see Scenario ③ for instances with local disks. |
Stalled | Disk performance is severely affected | Critical |
| This system event is sent immediately when Alibaba Cloud detects that an I/O hang occurs on a cloud disk attached to an ECS instance. This severely affects the disk performance and prevents the disk from being read from or written to. | Isolate read and write operations on the cloud disk at the application layer, or temporarily remove the instance from the SLB instance. |
Instance migration events due to underlying upgrades
Event code | Event name | Event severity level | Cloud Monitor event name | Event description and impact | Handling suggestion |
SystemUpgrade.Migrate | Instance migration required due to underlying upgrades | Critical | Undefined | If Alibaba Cloud upgrades and transforms the physical infrastructure, instances in the corresponding region and zone may be affected. This system event is sent to you in advance. | Log on to the ECS console to view the details of the system event and migrate the instance as prompted. For more information, see Instance migration due to underlying upgrades. |
Burstable instance performance restriction events
Event code | Event name | Event severity level | Cloud Monitor event name | Event description and impact | Handling suggestion |
Instance:BurstablePerformanceRestricted | Burstable instance performance is restricted | Warning | Instance:BurstablePerformanceRestricted: Burstable instance performance is restricted | This system event is sent immediately when the accrued CPU credits of a burstable instance are depleted. | Select a response method as needed:
To customize the threshold for triggering notifications, for example, to receive a notification when the accrued CPU credits are less than 10 for 10 consecutive minutes, you can set a threshold-based alert rule in the Cloud Monitor console. For more information, see Monitor burstable instances. |
Status change events
Event code | Event name | Event severity level | Cloud Monitor event name | Event description and impact | Handling suggestion |
Instance:PreemptibleInstanceInterruption | Spot instance interruption notification | Warning | Instance:PreemptibleInstanceInterruption: Spot instance interruption notification | This system event is sent 5 minutes before a spot instance is reclaimed. | We recommend that you:
|
Instance:ModifyInstanceSpec.Reboot | Instance restart required for instance type change to take effect | Critical |
| After an instance type is changed, the instance must be restarted for the new configuration to take effect. If you do not restart the instance within seven days after the new order takes effect, the system forcibly restarts the instance to apply the new instance type. | We recommend that you:
|
Instance:PerformanceModeChange | Performance mode switchover of burstable instance | Warning | Instance:PerformanceModeChange: Performance mode switchover of burstable instance | This system event is generated when a burstable instance switches from unlimited mode to standard mode, or from standard mode to unlimited mode. | Determine whether to follow this system event as needed. If you want to follow the event, you can set an event notification in the Cloud Monitor console. For more information, see Subscribe to ECS system event notifications. |
Instance:StateChange | Instance status change notification | Information | Instance:StateChange: Instance status change notification | This system event is generated when the instance status changes, for example, from Running to Stopping, or from Stopping to Stopped. | Determine whether to follow this system event as needed. If you want to follow the event, you can set an event notification in the Cloud Monitor console. For more information, see Subscribe to ECS system event notifications. |
Instance:AutoReactivateCompleted | Automatic reboot completion | Information | Instance:AutoReactivateCompleted: Automatic reactivation completed | This system event is generated when you have paid the overdue bill and the instance is automatically restarted. | Determine whether to follow this system event as needed. If you want to follow the event, you can set an event notification in the Cloud Monitor console. For more information, see Subscribe to ECS system event notifications. |
Instance:LiveMigrationAcrossDDH | Instance hot migration between dedicated hosts | Information | Instance:LiveMigrationAcrossDDH: Instance hot migration between dedicated hosts | This system event is generated when an instance is hot migrated. | Determine whether to follow this system event as needed. If you want to follow the event, you can set an event notification in the Cloud Monitor console. For more information, see Subscribe to ECS system event notifications. |
Disk:DiskOperationCompleted | Disk operation completed | Information | Disk:DiskOperationCompleted: Disk operation completed | This system event is generated when a pay-as-you-go disk is manually attached or detached. | Determine whether to follow this system event as needed. If you want to follow the event, you can set an event notification in the Cloud Monitor console. For more information, see Subscribe to ECS system event notifications. |
Disk:ConvertToPostpaidCompleted | Disk converted to pay-as-you-go | Information | Disk:ConvertToPostpaidCompleted: Disk converted to pay-as-you-go | This system event is generated when a subscription disk is converted to a pay-as-you-go disk. | Determine whether to follow this system event as needed. If you want to follow the event, you can set an event notification in the Cloud Monitor console. For more information, see Subscribe to ECS system event notifications. |
Snapshot:CreateSnapshotCompleted | Disk snapshot created | Information | Snapshot:CreateSnapshotCompleted: Disk snapshot created | This system event is generated when a snapshot for a disk is created. | Determine whether to follow this system event as needed. If you want to follow the event, you can set an event notification in the Cloud Monitor console. For more information, see Subscribe to ECS system event notifications. |
Snapshot:SnapshotDeleted | Snapshot deletion completed event | Information | Snapshot:SnapshotDeleted: Snapshot deletion completed event | This system event is generated when a manual snapshot or an automatic snapshot is deleted. | None |
Instance performance risk events
Event code | Event name | Event severity level | Cloud Monitor event name | Event description and impact | Handling suggestion |
Instance:CPUPerformanceReachLimit | Instance CPU performance reaches the upper limit of the instance type | Warning | Instance:CPUPerformanceReachLimit:Executed : Instance CPU performance reaches the upper limit of the instance type | Alibaba Cloud detects that the CPU utilization of the instance has reached 100% or the upper limit of its instance type. Note The event is sent if the CPU upper limit defined for the instance type is reached twice within the last three minutes. | Sustained CPU usage at the upper limit of the instance type may adversely affect your business. Adjust your configuration as needed. For more information, see Discover and troubleshoot instance issues. |
Instance:StoragePerformanceReachLimit | Instance storage performance reaches the upper limit of the instance type | Warning | Instance:StoragePerformanceReachLimit:Executed : Instance storage performance reaches the upper limit of the instance type | Alibaba Cloud detects that the disk bandwidth or IOPS of the instance has reached the upper limit of its instance type. Examples:
Note This event is not supported for ECS instances of generations earlier than Generation 6. The event is sent if the storage performance upper limit defined for the instance type is reached twice within the last three minutes. | Sustained storage performance at the upper limit of the instance type may adversely affect your business. Adjust your configuration as needed. For more information, see Discover and troubleshoot instance issues. |
Instance:NetworkPerformanceReachLimit | Instance network performance reaches the upper limit of the instance type | Warning | Instance:NetworkPerformanceReachLimit:Executed : Instance network performance reaches the upper limit of the instance type | Alibaba Cloud detects that the network performance of the instance has reached the upper limit of its instance type. Examples:
Note The event is sent if the network performance upper limit defined for the instance type is reached twice within the last three minutes. | Sustained network performance at the upper limit of the instance type may adversely affect your business. Adjust your configuration as needed. For more information, see Discover and troubleshoot instance issues. |
Instance:StatusCheckFailed | Instance status check failed | Warning |
| Alibaba Cloud detects a connectivity exception for the instance. Examples:
| Alibaba Cloud has detected a connectivity exception for the instance, which requires prompt troubleshooting. For more information, see Diagnose network connectivity. |