Overview

Updated at: 2023-09-25 07:08

"Change" refers to any operation on the online system (such as release, addition, modification, or removal), or any operation that may affect production business. Based on Alibaba's historical experience, more than half of major failures are triggered by changes. Therefore, risk prevention during the change process is particularly important and directly affects the stability of the business.

"Change system" refers to any system or tool that carries out change operations on the online production environment. For example, white-screening systems/tools with a console, load testing/drill platforms, black-screen scripts, open APIs that can trigger change operations, platforms/systems whose main functionality and goals are not changes but have the ability to make changes to the production environment. The corresponding change functionality part is also considered as part of the change system, etc.

Change risk control is first and foremost a business concept, a set of standards that guide change operations in the stability field, and also regulate the capacity building of the economic change system. Secondly, change risk control is a technical system that intervenes in the entire lifecycle of change through technical means. It conducts pre-change admission checks, constrains the progressive execution process during changes, and verifies the intermediate results through macroscopic observation methods to promptly identify issues and rollback if necessary, and provides application of change data through topological expansion of the impact surface after changes to assist with troubleshooting and problem diagnosis.

The main objectives of change risk control are:

  • Converge major failures caused by changes;

  • Standardize change operations of business teams and accumulate common change capabilities and execution standards;

  • Help in the construction of risk control capabilities for change systems to ensure the execution of business changes.

Standard change processes usually consist of three parts: planning, execution, and completion. Among them:

  • Planning phase: This phase mainly includes change application and approval. The change application needs to clearly state the change plan, window period, potential impacts, and rollback plan.

  • Execution phase: First, the change behavior is rechecked, such as confirming whether the change environment meets the requirements and whether the business traffic has been stopped as expected. It is recommended to verify in the test environment before entering the production environment change phase, and gray release and batch processing are also recommended. A certain interval should be set between each batch, and at least one indicator that reflects the health status of the core business (such as business monitoring items, log file names, etc.) should be observed and recorded. Rollback capability should also be available.

  • Completion phase: The business should be verified through monitoring, logs, and other data to determine if it is normal, and relevant data should be recorded and reported.

For any operation on the online system (such as release, addition, modification, or removal), or any operation that may affect production business, they need to meet the three principles of change risk prevention: "Observable, Gray-released, and Rollbackable".

    Feedback
    phone Contact Us

    Chat now with Alibaba Cloud Customer Service to assist you in finding the right products and services to meet your needs.

    alicare alicarealicarealicare