Alibaba Cloud will be launching a new functionality for the Elastic Compute Service (ECS) called System Event. A system event is a scheduled and recorded maintenance event of ECS service. System events occur when updates, invalid operations, unexpected system failures, or unexpected hardware or software failures are detected on your ECS instance. Moreover, you will receive notification about the details of the event in the console when it occurs, including the event response plan and event cycle.
When an ECS user receives a notification from Alibaba Cloud, he or she can acknowledge the planned underlying maintenance for ECS instance by system event. The user can then choose the appropriate time window to execute the system event as well as operation activities according to individual business needs. By providing users this flexibility, users can reduce the impact on system reliability and business continuity.
Alibaba Cloud is dedicated to guaranteeing data reliability and high availability of cloud computing infrastructure and cloud servers to our customers. Compared with traditional IDC or on-premises environments, Alibaba Cloud adopts more stringent IDC standards, server access standards, and O&M standards. In addition, Alibaba Cloud provides multi-Zones in various Regions. When customers need higher availability, they can leverage Alibaba Cloud's Multi-zones to build their own active/standby or active/active services.
For financial solutions, which may have higher requirements for business continuity, systems and services can be built based on multiple regions and zones, for better RTO/RPO and greater fault tolerance. For one ECS instance, Alibaba Cloud uses commercially reasonable endeavors to provide a Monthly Uptime Percentage of no less than 99.95% each calendar month in connection with your use of the ECS instance. Moreover, Alibaba Cloud provides the service availability of no less than 99.99% with multi-zones in a region.
In order to ensure a high level of service availability, Alibaba Cloud will perform proactive maintenance for physical servers that host ECS instances and resolve potential issues about hardware and software to continuously improve system reliability, performance, and security protection capabilities. Normally, when there is maintenance activity planned on the physical server, the ECS instance will be live migrated to another server to maintain the health of ECS instance.
However, ECS customers may occasionally receive message notifications to remind that the ECS instance needs to be maintained due to the risk of a physical server failure, and Alibaba Cloud sets a scheduled system event to restart the instance and migrate to a health physical server in a few days.
In fact, this is a maintenance notification triggered automatically by Alibaba Cloud's proactive maintenance. During the maintenance process, some software and hardware failures may cause live-migration to fail. In this case, Alibaba Cloud will send the above notification to the user to remind that the system is about to perform a migration by restarting the instance.
In order to improve the efficiency and experience of your operation of ECS instances, Alibaba Cloud will launch new functionality as system event for ECS instances. When customers receive a notification, they can check the system planning events at the ECS console or using OpenAPI, and select the appropriate time to execute the events according to the needs of the business (in some cases, customer can only wait for system events to execute at scheduled time windows). This eliminates the need for manual intervention by customer contact via a work order, reduces risk of human error, and provides the possibility for automated failover based on system events.
If there is a scheduled system event to restart instance, an indication appears on the ECS Console to remind the user to check. In Unsettled Events > System Scheduled Events page, user can check instance-related information for instance ID, region, status, and system event-related information for event type, planned schedule, and optional operation button. Alternatively, ECS user can query the instance system events with OpenAPI DescribeInstanceFullStatus.
When mission-critical applications are running on the ECS instances, any unexpected restart of an instance may threaten or seriously affect system availability and business continuity. Therefore, we recommend that the users build the application with fault tolerant architecture and leverage the services such as regions/zones and load balancer to enhance system reliability.
On this basis, for the system event triggered by Alibaba Cloud's proactive maintenance, the notice will usually be sent to the users a few days in advance. This allows users to use the period before the planned execution time as user's operation window to prepare the failover operations and then restart the instance.
For example, users can timely transfer the workload from the instance with scheduled event to another one in a cluster environment, or backup and transport the data on the local disk before instance redeployment. They can also proactively modify the configuration of load balancer and elastic scaling, or make sequential stop and start instances based on the business logic, to minimize the impact of instance restart on business continuity.
Furthermore, Alibaba Cloud will continue to launch more types and scenarios of ECS system events. In this way, we hope to continuously improve the efficiency and experience of IT operation on Alibaba Cloud, and deliver more interfaces and services to support users to achieve the peace of mind for operation and continuity for business.
As a leading and trusted cloud service provider, Alibaba Cloud provides and guarantees the availability, stability, and security of computing, storage, network services, and the underlying infrastructure. According to the strategic targets and business needs, customers can design a high available IT architecture on Alibaba Cloud, select suitable products and services to build a reliable and robust business system.
Based on this foundation, through Alibaba Cloud's OpenAPI, monitoring, orchestration, and other diversified means, customers are able to obtain various IT development and operation capabilities, such as reqid provision of resources, easy management of multiple sets of environments, agile deployment, etc.
To learn more about the System Event feature for Alibaba Cloud ECS, visit https://www.alibabacloud.com/help/doc-detail/66574.htm
Cooperation with NVIDIA GPU Cloud (NGC) at The Computing Conference
2,599 posts | 762 followers
FollowAlibaba Clouder - February 28, 2020
Alibaba Cloud ECS - September 10, 2020
Alibaba Clouder - October 25, 2018
Alibaba Cloud Native Community - December 6, 2022
Alibaba Clouder - January 26, 2021
Alibaba Cloud Storage - March 3, 2021
2,599 posts | 762 followers
FollowElastic and secure virtual cloud servers to cater all your cloud hosting needs.
Learn MoreAlibaba Cloud Function Compute is a fully-managed event-driven compute service. It allows you to focus on writing and uploading code without the need to manage infrastructure such as servers.
Learn MoreHigh Performance Computing (HPC) and AI technology helps scientific research institutions to perform viral gene sequencing, conduct new drug research and development, and shorten the research and development cycle.
Learn MoreEventBridge is a serverless event bus service that connects to Alibaba Cloud services, custom applications, and SaaS applications as a centralized hub.
Learn MoreMore Posts by Alibaba Clouder
Raja_KT February 15, 2019 at 5:10 am
Good one. One part of BC solution is at least done.