More enterprises choose to build systems on the cloud because of the low cost and stability. The cloud has become a mainstream IT infrastructure. In recent years, open-source technology and cloud technology have maintained rapid development, and a wide variety of products and services have emerged. Technical support personnel have more decision-making power, and the speed of architectural change has been accelerating. In the process of high-speed growth, we should guard against human-made unreasonable failures and also pay attention to the impact of natural disasters. An inappropriate business interruption may result in serious brand, customer, and economic losses.
All cloud-based enterprises take the capacity building of disaster recovery systems as the most basic goal and ensure investment. Enterprises can only ensure long-term, stable, and high-speed development by ensuring key data is not lost and system services resume operation as soon as possible when a disaster occurs.
Failures are inevitable in the production practice of enterprises, affecting the stability of the system. Some failures recover quickly, and external users do not sense them. Some failures cannot be recovered for a long time, causing problems (such as external public opinion and capital loss), and may even lead to the bankruptcy of the company. There are generally the following types of failures:
Under these disasters, it is often faced with the interruption of the public network, access gateway, computer room, and other facilities, which will cause business problems, such as traffic drop, website failure, and fault alarm. Enterprises need to face the two major problems of business recovery and fault recovery. The best way is to decouple these two types of problems. In case of failure, they can quickly cut off the flow and prioritize business recovery. Perform fault location and repair on the premise of service recovery.
Common fault location and recovery in the industry covers four steps: finding problems, locating problems, fixing problems, and business recovery. It fails to meet the requirements of business recovery and fault recovery decoupling processing. A better response is to upgrade these four-fault handling steps to the three fault handling steps of finding problems, cut stream, and business recovery. The cut stream is used to ensure rapid business recovery and shorten the recovery time from minutes or hours to minutes or seconds to improve the disaster recovery capability of the business.
We need to build higher-level disaster recovery architecture technologies to ensure the realization of fast flow cutting and effective flow cutting in real scenarios. We also need to enhance the collaboration of infrastructure, business systems, guarantee tools, production systems, and emergency personnel. The ability of multi-active disaster recovery can be realized through the collaboration between the structure and the organization.
This capability cannot be broken through immediately. It requires constant optimization of the structure and organizational collaboration to promote the spiral of multi-active disaster recovery of the business.
Enterprises generally choose a single region to deploy in the initial stage. However, with the scale of business development, a single region data center cannot meet business needs. At the same time, with the explosive growth of the number of connections of clustered components in a single region, the capacity of a single cluster fails to continue to expand, and it is urgent to split the cluster.
However, when supporting cross-region cluster splitting, the principles of route consistency and data consistency need to be met. Then, the service can break through regional restrictions and achieve cross-region capacity horizontal expansion and flexibly schedule traffic. This will solve the capacity challenges in a single region, such as:
Disaster recovery is based on data-level disaster recovery. A common implementation method is to build the same application system in the backup data center. When a disaster occurs, it will resume operation within the Recovery Time Object (RTO) to minimize the losses caused by the disaster. The following problems exist in actual implementation:
Application Multi-Activity is an advanced form of Application Disaster Recovery technology. It refers to the establishment of a production system corresponding to some or all of the local production systems in the same city or the local data center. All applications in the data center provide external services at the same time. When a disaster occurs, the multi-active system can switch business traffic within minutes, and users won't even feel the failure.
Common application multi-active architectures are divided into local region active, ultra region active, and hybrid cloud active. Compared with traditional disaster recovery, application multi-activity has the following four advantages:
More than 50% of enterprises will use the distributed cloud by 2025. Public cloud service capabilities will be extended to edge computing and IDC. The distributed cloud enables full scenario coverage. Multi-active scenarios and technologies across cloud, cross-platform, and cross-geographic applications will emerge. The application system must have the ability to escape from disaster failures at any time. Smooth migration to the cloud is a key decision point for every decision-maker. The business continues to develop, the architecture continues to evolve, and disaster recovery governance solves developing problems. How to implement a multi-active application disaster recovery architecture and organizational collaboration has become a concern for more enterprises.
Cloud-Native DevOps - Modeling Application Delivery Is Important
503 posts | 48 followers
FollowAlibaba Developer - December 16, 2021
Alibaba Cloud Native Community - June 29, 2022
Alibaba Cloud Community - December 23, 2021
Alibaba Cloud Storage - April 3, 2019
Alibaba Cloud Native - June 11, 2024
Alibaba Clouder - March 26, 2020
503 posts | 48 followers
FollowAlibaba Cloud‘s Enterprise IT Governance solution helps you govern your cloud IT resources based on a unified framework.
Learn MoreProtect, backup, and restore your data assets on the cloud with Alibaba Cloud database services.
Learn MoreAlibaba Cloud provides products and services to help you properly plan and execute data backup, massive data archiving, and storage-level disaster recovery.
Learn MoreAlibaba Cloud helps you create better IT services and add more business value for your customers with our extensive portfolio of cloud computing products and services.
Learn MoreMore Posts by Alibaba Cloud Native Community