By Xuanling
It's difficult to perform data migration with Apache Cassandra, especially for seamless migration without downtime. Today, the community recommends applying the COPY command, sstableloader, and other methods to migrate data between Cassandra databases. However, none of the methods above are qualified for efficient migration, and each has its disadvantages. For example, the COPY command method initiates a multi-thread range scan on the cqlsh side, reads, and converts the data into CSV text. Then, it writes the data to Cassandra in batches. This cannot meet the requirements in cases of large data volume. For sstablecoader, duplicated migration of one copy of data from the source database occurs in the multi-copy keyspace of the target database. The time and space for data migration are wasted.
BDS (BDS) is a proprietary NoSQL data migration service developed by the Alibaba Cloud NoSQL Team. For many years, BDS has been tested and practiced by Alibaba Group and provided services for public cloud users on Alibaba Cloud. Now, it supports Cassandra for data migration. Based on BDS can meet the demands of massive Cassandra data migration, high-performance migration, and very low migration costs of service interruption.
The primary objectives of BDS migration are stability, high performance, less migration time, and lowering business impact. According to analysis, the best way to minimize migration time is to perform file-level cloning. The relevant data range is pre-assigned for Cassandra cluster nodes. If direct data migration for each range between the source side and the target side can be realized, it's possible to migrate at maximum speed.
Figure 1
As shown in Figure 1, the left source cluster consists of 4 nodes (W/X/Y/Z) that are assigned relevant data range in advance. Each node is given a token assigned specific data range: (10-40], (40-max], and (min-40].
A corresponding associated target cluster is required before BDS migration, such as the right four-node cluster in Figure 1 in this case. During the migration, BDS will perform the following steps:
The preceding method ensures that full data and incremental data can be migrated quickly and completely. According to testing and analysis, the migration speed of this BDS migration solution is very fast, which almost equals the remote file copying between cluster nodes. For example, there are three nodes, and each contains 1 TB of data. In the network environment with a migration bandwidth of 150 MB/s, the data migration takes only 2 hours, which is much faster than other solutions.
Solution | Details | Advantages | Disadvantages | Migration Complexity |
COPY FROM & COPY TO | cqlsh concurrently scans and converts data into CSV files and writes files to Cassandra. | Supported by cqlsh | Offline migration is required in case of large data volume. | Medium cqlsh uses the command line directly to import or export files. |
sstableloader | Use this command to perform stream transmission of sstable files on the target side. | Faster than COPY FROM/TO | In the case of 3 table copies, the migration speed may be very slow, and the target cluster may contain nine copies of redundant data. | High The sstable file loading is performed manually. |
BDS for Cassandra | BDS copies files directly and loads files. | Fast and stable Minimal downtime and less impact on services |
BDS service is required. | Low After addresses on both sides are configured, click the page to start the migration. |
1. Purchase the BDS service on this page.
2. Users prepare the target environment, such as purchasing the cloud Cassandra service or building their own Cassandra service.
3. Initiate the sftp service for the source Cassandra cluster migration. You can search for specific steps online. For incremental migration, all nodes in the source cluster need to enable incremental backup through nodetool.
4. Configure the endpoint addresses of the source and target clusters in BDS. The detailed configuration process is listed below:
4.1 . Configure the whitelists of all IP addresses of the source and target clusters for the purchased BDS service. If users choose the cloud Cassandra service, they need to create a whitelist on the source cluster and add the corresponding IP address of BDS.
4.2. On the BDS page, click Basic Information :arrow_right: UI Access :arrow_right: Engine Software UI :arrow_right: BDS. Then, click "Advanced Access" and enter the account password. If users forget the account password, they can click "Reset UI Access Password" on the UI access page to reset the password.
4.3. Add a data source on the new page of data source management. It's required to add the data sources of the source and target clusters. Users can enter an identified cluster name in the blank of "Cluster Name" and select "Cassandra3X" in "Data Source Type."
The reference template of ”Data Source Parameters” is listed below:
{
"cassandraPassword":"Password to access Cassandra cluster",
"cassandraUser":"Account to access Cassandra cluster",
"confDir":"Cassandra profile directory",
"dataDir":[
"Cassandra data directory"
],
"hosts":[
{
"ip":"The IP address of the Cassandra cluster. If there are multiple IP addresses, arrange them in the following way."
},
{
"ip":"The second cluster IP, and so on"
},
{
"ip":"The third cluster IP"
}
],
"nodetoolCmd":"The directory address of Cassandra nodetool, such as xxx/bin/nodetool",
"sshPassword":"Password accessible to ssh",
"sshUser":"Account accessible to ssh",
// The following two lines should be configured in the data source template of target cluster. Absolute paths to start and stop Cassandra commands are required. Remove this note in the actual data source profile.
"startCmd":"su cassandra -l -c 'for starting Cassandra command'",
"stopCmd":"su cassandra -l -c 'for stopping Cassandra command'"
}
The preceding configuration must be performed for data sources on the source and target clusters. The configuration template is listed above.
5. Click "One-Click Migration" under "Cassandra Migration." Configure the corresponding tables to be migrated after the configured data sources on the source cluster and the target cluster. Once it's completed, click "Migration Service."
Alibaba Cloud supports commercial Cassandra for Chinese and foreign customers. For more information, please see: the Chinese page and the English page.
Recently, Alibaba Cloud launched Lindorm, a cloud-native multi-mode database. For more information about Lindorm, check out the following articles:
miHoYo, a First-Generation Cloud-Native Enterprise, Realizes Imagination
ApsaraDB - January 25, 2022
Alibaba Clouder - December 19, 2016
Alibaba Clouder - July 21, 2020
Rupal_Click2Cloud - September 24, 2021
Alibaba Clouder - August 19, 2019
Alibaba Clouder - February 7, 2018
Migrating to fully managed cloud databases brings a host of benefits including scalability, reliability, and cost efficiency.
Learn MoreMigrate your legacy Oracle databases to Alibaba Cloud to save on long-term costs and take advantage of improved scalability, reliability, robust security, high performance, and cloud-native features.
Learn MoreA database engine fully compatible with Apache Cassandra with enterprise-level SLA assurance.
Learn MoreAn easy transformation for heterogeneous database.
Learn MoreMore Posts by ApsaraDB