×
Community Blog Why Should Enterprises Have Multi-Active Applications?

Why Should Enterprises Have Multi-Active Applications?

This article explains why implementing a multi-active application disaster recovery architecture and organizational collaboration are becoming more important for cloud users.

_

More enterprises choose to build systems on the cloud because of the low cost and stability. The cloud has become a mainstream IT infrastructure. In recent years, open-source technology and cloud technology have maintained rapid development, and a wide variety of products and services have emerged. Technical support personnel have more decision-making power, and the speed of architectural change has been accelerating. In the process of high-speed growth, we should guard against human-made unreasonable failures and also pay attention to the impact of natural disasters. An inappropriate business interruption may result in serious brand, customer, and economic losses.

All cloud-based enterprises take the capacity building of disaster recovery systems as the most basic goal and ensure investment. Enterprises can only ensure long-term, stable, and high-speed development by ensuring key data is not lost and system services resume operation as soon as possible when a disaster occurs.

Common Disaster Failures

Failures are inevitable in the production practice of enterprises, affecting the stability of the system. Some failures recover quickly, and external users do not sense them. Some failures cannot be recovered for a long time, causing problems (such as external public opinion and capital loss), and may even lead to the bankruptcy of the company. There are generally the following types of failures:

  • Human operation errors, such as common configuration errors and application release failures
  • Hardware failure, such as network device failure, affects multiple servers in the computer room or cluster
  • Network attacks, such as DDoS and other network attacks
  • Network disconnection/power failure, such as the optical cable disconnecting
  • Natural disasters, such as power failures in computer rooms caused by lightning strikes

Under these disasters, it is often faced with the interruption of the public network, access gateway, computer room, and other facilities, which will cause business problems, such as traffic drop, website failure, and fault alarm. Enterprises need to face the two major problems of business recovery and fault recovery. The best way is to decouple these two types of problems. In case of failure, they can quickly cut off the flow and prioritize business recovery. Perform fault location and repair on the premise of service recovery.

The Growth of Fault Escape Ability

Common fault location and recovery in the industry covers four steps: finding problems, locating problems, fixing problems, and business recovery. It fails to meet the requirements of business recovery and fault recovery decoupling processing. A better response is to upgrade these four-fault handling steps to the three fault handling steps of finding problems, cut stream, and business recovery. The cut stream is used to ensure rapid business recovery and shorten the recovery time from minutes or hours to minutes or seconds to improve the disaster recovery capability of the business.

We need to build higher-level disaster recovery architecture technologies to ensure the realization of fast flow cutting and effective flow cutting in real scenarios. We also need to enhance the collaboration of infrastructure, business systems, guarantee tools, production systems, and emergency personnel. The ability of multi-active disaster recovery can be realized through the collaboration between the structure and the organization.

This capability cannot be broken through immediately. It requires constant optimization of the structure and organizational collaboration to promote the spiral of multi-active disaster recovery of the business.

Breakthrough Region Limits

Enterprises generally choose a single region to deploy in the initial stage. However, with the scale of business development, a single region data center cannot meet business needs. At the same time, with the explosive growth of the number of connections of clustered components in a single region, the capacity of a single cluster fails to continue to expand, and it is urgent to split the cluster.

However, when supporting cross-region cluster splitting, the principles of route consistency and data consistency need to be met. Then, the service can break through regional restrictions and achieve cross-region capacity horizontal expansion and flexibly schedule traffic. This will solve the capacity challenges in a single region, such as:

  1. Machine Capacity: Peer-to-peer deployment of multiple remote data centers. Enterprise applications can flexibly deploy business applications in multiple data centers.
  2. Connection Capacity: The clustered components in the data center are independent, and each data center is connected to its components to avoid the problem of an unlimited increase in the number of connections.

Limitations of Disaster Recovery

Disaster recovery is based on data-level disaster recovery. A common implementation method is to build the same application system in the backup data center. When a disaster occurs, it will resume operation within the Recovery Time Object (RTO) to minimize the losses caused by the disaster. The following problems exist in actual implementation:

  1. The disaster recovery center does not provide services. It is impossible to determine whether the switchover can be successful at the critical moment of the switchover to the disaster recovery center.
  2. If the disaster recovery center does not provide services, its resources are idle, resulting in high costs.
  3. The data center that provides services still stays in a single region. When the business volume is large to a certain extent, this mode fails to solve the problem of resource bottleneck in a single region.

The Concept of Application Multi-Activity

Application Multi-Activity is an advanced form of Application Disaster Recovery technology. It refers to the establishment of a production system corresponding to some or all of the local production systems in the same city or the local data center. All applications in the data center provide external services at the same time. When a disaster occurs, the multi-active system can switch business traffic within minutes, and users won't even feel the failure.

Common application multi-active architectures are divided into local region active, ultra region active, and hybrid cloud active. Compared with traditional disaster recovery, application multi-activity has the following four advantages:

  • Minute-Level RTO: The recovery time is fast. The average recovery time of Alibaba's internal production level is within 30 seconds, and the average recovery time of an external customer production system is one minute.
  • Make Full Use of Resources: There is no problem with idle resources. Multiple data centers and resources are fully utilized to avoid resource waste.
  • The High Switching Success Rate: Relying on the mature multi-active technology architecture and visual operation and maintenance platform and compared with the existing disaster recovery architecture, the switching success rate is high, and the success rate of thousands of times per year in Alibaba is as high as 99.9%.
  • Precise Traffic Control: The application of multi-active support traffic is closed from top to bottom, relying on the precise drainage ability to get the specific business traffic into the corresponding data center. Enterprises can be based on this advantage ability to incubate the canary release, key traffic guarantee, and other features.

More than 50% of enterprises will use the distributed cloud by 2025. Public cloud service capabilities will be extended to edge computing and IDC. The distributed cloud enables full scenario coverage. Multi-active scenarios and technologies across cloud, cross-platform, and cross-geographic applications will emerge. The application system must have the ability to escape from disaster failures at any time. Smooth migration to the cloud is a key decision point for every decision-maker. The business continues to develop, the architecture continues to evolve, and disaster recovery governance solves developing problems. How to implement a multi-active application disaster recovery architecture and organizational collaboration has become a concern for more enterprises.

0 0 0
Share on

You may also like

Comments

Related Products