By Fanjun
In recent decades, data security has received an unprecedented level of importance. And, at the same time, data protection has also gained even more attention. The bottomline is that service interruption time needs to dwindle as it is a constant plaque to today's users.
In this article, Fanjun, a technical expert at Alibaba, will address everything you need to know about Continuous Data Protection (CDP), and will provide a solution, specifically the elastic assured CDP solution. Fanjun will throw light on the various aspects of CDP, including its scenarios and development, as well as the challenges involved with data security nowadays, by defining the problem, looking at some traditional solutions and solutions adopted by today's cloud vendors.
So, what is there to know? Well, first, traditional CDP solutions obtain data-change logs at the Guest OS layer or private storage layer during the write operation. This has a great impact on the storage performance of production machines and can cause customers to bear increasing computing and storage costs after cloud migration. Next, data protection, in a hybrid architecture, seriously lags behind public cloud services in terms of network bandwidth and implementation complexity and fails to satisfy customers who are trying to reduce their recovery point objective (RPO) and recovery time objective (RTO). In addition to snapshot implementation and data migration, CDP also has a clear focus on the protection and recovery of data as well as providing efficient service continuity, which is something you don't see with snapshots and replication plans.
So, what's Alibaba Cloud's solution? Alibaba Cloud's Apsara Distributed File System version 2.0 provides a new block storage architecture that boosts CDP implementation in the cloud. With the core component, Log Structure Block Device supports a new data write method, log storage method, and snapshots. As enterprises move forward with cloud migration, Alibaba's Apsara Distributed File System version 2.0 will not only ensure storage performance but will also satisfy traditional and advanced enterprise users that are in need of data protection with low RTO and RPO requirements. And, in terms of data backup and its operability, the effectiveness of data protection mainly depends on the extent to which data is recovered.
In recent years, data security has received an unprecedented level of importance in the industry, and data protection likewise has gained a new level of attention as service interruption time continues to negatively impact cloud customers and any and all users of web services. Therefore, it is clear is that cloud customers need a more effective data security and protection solution to defend against viruses, ransomware, frequent mis-delete operations on databases, and direct attacks against backup software.
It goes without saying, data plays an increasingly important and crucial role in protecting our enterprise and online assets and resources nowadays. The data deletion incident of GitLab in January 2017 led the industry to pay much more attention to information security risks. In the data recovery process of GitLab, only the db1.staging database could be used for recovery operations, whereas the other five backup mechanisms were completely useless during this incident. Earlier, the db1.staging database only generated 6 hours of data. The recovery proceeded slowly because of the limited transmission rate. GitLab eventually lost nearly 6 hours of data.
So, as you can see form the above incident, many users nowadays are in urgent need of a solution to mitigate the risk of data loss, reduce the data protection window, significantly lower losses, and provide an efficient recovery mechanism. Low recovery time objective (RTO) and assured recovery contribute a lot to data protection. In today's ever changing and highly technical world, data recoverability is critical and takes precedence over storage costs.
The Storage Networking Industry Association (SNIA) defines continuous data protection (CDP) as a set of methods used to capture or track data changes by storing them separately from production data. This ensures that data is restored at any point in time. CDP is implemented based on blocks, files, or applications and can provide a recovery granularity that supports an unlimited number of recovery time points.
Gartner, if you happen not to know, is an authoritative IT research and consulting firm. According to them, CDP can be defined as a recovery method used to capture or track changes in data files or data blocks in a continuous or near-continuous manner and store these changes in logs.
This method provides fine-grainarity, real-time points to reduce data loss and make data recovery possible at any point in time. Some CDP solutions are configured to capture data changes continuously, which can be referred to as real CDP, and others are configured to capture data changes at a certain, specific time, which are referred to as near CDP.
In terms of metrics, only recovery point objective (RPO) and recovery time objective (RTO) really indicate the actual status of CDP. To be more precise:
Traditional data protection solutions focus on periodic data backup, which is accompanied by issues such as backup windows, data consistency, and impact on production systems. CDP provides a new data protection method that allows system administrators to quickly recover data only by selecting the desired backup time point, without having to pay attention to the data backup process. The CDP system continuously monitors changes on important data and automatically protects data.
CDP has the following advantages over traditional disaster recovery technology:
CDP enables point-in-time data recovery by recording and storing data changes in the following three modes:
In this mode, CDP creates reference data replicas, logs data differences based on changes in production data, and recovers data based on log differences. This mode is easy to implement. However, the recovery time is long as data recovery starts from the earliest reference data. The closer the recovery time point is to the current time, the longer the recovery time.
In this mode, CDP synchronizes production data and reference data replicas in real-time, records undo logs or events during synchronization and recovers data based on the differences of undo logs. The reference data replication mode is opposite to the reference data benchmarking mode. In the former mode, the closer the recovery time point to the current time, the shorter the recovery time. However, data and logs are synchronized when data is stored, which consumes many system resources.
The reference data merging mode is a compromise between the preceding two modes and achieves a better balance between resource consumption and RTO. However, it is difficult to implement since it requires complex software management and data processing. CDP technology or related solutions are implemented in multiple modes.
The CDP model varies according to different traditional vendors. Based on the storage sharing model of SNIA, CDP products or solutions are classified by application, file, and data block. This article describes CDP implementation at the data block level. The block-based CDP function may run on physical storage devices or logical volume managers, or even at the data transport layer.
When data blocks are written to the storage device of production data, the CDP system captures data replicas and stores them on another storage device. The block-based CDP function may implement on the host layer, transport layer, and storage layer.
The following table analyzes three vendors: FalconStor, Veeam Software, and EMC RecoverPoint. Of these, FalconStor is a representative vendor of CDP products.
All three vendors have a different background. The traditional storage vendor EMC acquired RecoverPoint to develop a CDP kit based on its own storage to protect the data on physical machines and virtual machines (VMs). Veeam Software is a rising star in the field of VM protection. It protects the data of VMware and Hyper-V VMs and expands its business to the cloud. Its current solution depends on VMware VAIO, which is a virtualized data acquisition framework.
EMC RecoverPoint/SE provides a complete solution for the EMC CLARiiON array, and EMC RecoverPoint provides a complete solution for data centers. The two products support local replication and synchronization, point-in-time data recovery, and asynchronous continuous remote replication (CRR).
Run CDP and CRR on an EMC RecoverPoint application device for concurrent local and remote (CLR) data protection to protect the same set of data locally and remotely by using a single solution. FalconStor CDP integrates multiple functions, such as data backup, system recovery, disaster recovery, local disaster recovery, and geo-disaster recovery.
FalconStor CDP is a disk-based backup and disaster recovery solution that supports real-time backup and instant recovery of files, databases, and operating systems by integrating local disaster recovery and geo-disaster recovery functions for verification and drills.
Amazon Web Services (AWS) provides native snapshot functions and cloud migration methods. Its features such as data backup depend on traditional data protection vendors. Microsoft Azure provides basic VM-based backup and recovery methods but does not provide CDP and other advanced functions.
Gartner's description of an elastic cloud backup engine specifies the following features of a successful elastic backup:
CDP is an advanced data protection solution implemented by cloud vendors. It provides an elasticity that is absent from traditional backup. Traditional vendors must transmit data to the cloud through a WAN during cloud migration. This consumes CPU and I/O resources.
Traditional vendors may run scheduled tasks to avoid resource overconsumption. However, this hampers the effect of elastic backup and CDP. The CDP solution features assure reliability and operability. Inter-volume consistency groups and application consistency must be established to guarantee a consistent application of data across all volumes.
Data protection is a preventive method. Traditional enterprises impose higher requirements of cloud data protection when moving forward with cloud migration. Users attach more importance to data and are more sensitive to data loss than ever before. This heightens the conflict between cloud data protection and user requirements. Traditional block storage-based CDP depends on specific storage devices and is not elastic enough for off-premises implementation. Moreover, it does not adapt to complex, off-premises, distributed environments.
CDP is an important supplement to traditional or hybrid cloud data protection solutions and will become a new solution highly valuable for enterprise users. Apsara Distributed File System 2.0 provides a new block storage architecture that boosts off-premises CDP implementation.
As enterprises push forward with cloud migration, Apsara Distributed File System 2.0 will not only ensure storage performance but will also satisfy traditional advanced enterprise users who require data protection with low RTO and RPO.
How We Developed DingTalk: Implementing the Message System Architecture
How Log Service Has Evolved into a Data Pipeline Over the Past 5 Years
57 posts | 12 followers
FollowAlibaba Cloud_Academy - June 16, 2020
Alibaba Cloud_Academy - September 1, 2022
Apache Flink Community China - April 13, 2022
Apache Flink Community - October 11, 2024
Alibaba Cloud Community - August 6, 2024
Cherish Wang - January 17, 2019
57 posts | 12 followers
FollowA cloud firewall service utilizing big data capabilities to protect against web-based attacks
Learn MoreProtect, backup, and restore your data assets on the cloud with Alibaba Cloud database services.
Learn MoreAlibaba Cloud provides products and services to help you properly plan and execute data backup, massive data archiving, and storage-level disaster recovery.
Learn MoreWeb App Service allows you to deploy, scale, adjust, and monitor applications in an easy, efficient, secure, and flexible manner.
Learn MoreMore Posts by Alibaba Cloud Storage