Alibaba Cloud Object Storage Service (OSS) provides the Cross-Region Replication (CRR) feature. However, depending on the data size, the replication process may take several hours. For more information about CRR, please refer to the documentation here.
A new feature called Replication Time Control (RTC) has been released for CRR. When RTC is enabled, it guarantees that 99.99% of the CRR data will be duplicated within 10 minutes as per the SLA. RTC also provides various metrics to monitor the status of CRR. For more details about RTC, refer to the documentation here. This article will verify the effectiveness of using this feature.
To confirm the effectiveness of RTC in this verification, we will compare the time required for CRR with RTC enabled and disabled. Since the RTC feature is currently not available in the Japan (Tokyo) region, this cross-replication verification will be performed using the China (Hangzhou) and China (Shanghai) regions. To find out the regions where RTC is available, visit the RTC introduction page.
For this verification, we will use two duplication patterns: multiple small files and a large file.
2-1. Create a source bucket in the China (Hangzhou) region.
・Click Create Bucket to display the creation screen.
・Configure the necessary settings, such as the bucket name and region.
・Click OK to create the bucket.
2-2. As in Step 2-1, create two destination buckets A and B in the China (Shanghai) region.
2-3. Verify that the buckets are created.
2-4. Create a cross-region replication job between the source bucket and the destination bucket A with RTC disabled.
・On the details page of the source bucket, click Cross-Region Replication to go to the creation page.
・Select the destination bucket A to be the replication destination.
・Configure the replication job:
Objects to Replicate: All Files in Source Bucket
Replication Policy: Add/Delete/Change
Replicate Historical Data: Yes
Replicate Objects Encrypted based on KMS: No
Replication Time Control (RTC): Disabled
・Click OK.
・Click Enable on the pop-up window to perform the operation.
2-5. Wait until the status of the created replication job becomes Enabled.
2-6. Create a cross-region replication job between the source bucket and the destination bucket B with RTC enabled.
・On the details page of the source bucket, click Cross-Region Replication to go to the creation page.
・Select the destination bucket B to be the replication destination.
・Configure the replication job:
Objects to Replicate: All Files in Source Bucket
Replication Policy: Add/Delete/Change
Replicate Historical Data: Yes
Replicate Objects Encrypted based on KMS: No
Replication Time Control (RTC): Enabled
・Click OK.
・Click Enable on the pop-up window to perform the operation.
2-7. Wait until the status of both the created replication job and the RTC feature become Enabled.
3-1. To measure the time required for cross-region replication, prepare a Python script to generate the number of files in the destination buckets A and B.
import oss2
import datetime
auth = oss2.Auth('yourAccessKeyId', 'yourAccessKeySecret')
bucket_src = oss2.Bucket(auth, 'yourEndpoint', 'bucket name')
bucket_dst = oss2.Bucket(auth, 'yourEndpoint', 'bucket name')
print('Test start on {0}'.format(datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')))
while True:
a_counts = 0
b_counts = 0
for obj in oss2.ObjectIterator(bucket_src, prefix=''):
a_counts += 1
print('Get {0} files in noRTC bucket on {1}'.format(a_counts, datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')))
for obj in oss2.ObjectIterator(bucket_dst, prefix=''):
b_counts += 1
print('Get {0} files in RTC bucket on {1}'.format(b_counts, datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')))
Output example
Test start on 2023-05-08 19:30:56
Get 0 files in noRTC bucket on 2023-05-08 19:30:57
Get 0 files in RTC bucket on 2023-05-08 19:30:57
Get 0 files in noRTC bucket on 2023-05-08 19:30:57
Get 0 files in RTC bucket on 2023-05-08 19:30:57
……
3-2. Prepare a Python script to generate 2,000 small files and upload them to the source bucket.
from faker import Faker
import random
import oss2
import datetime
def send_testing_data_file_to_oss(oss_bucket, file_counts, name_pattern):
fake = Faker(locale='ja_jp')
for i in range(file_counts):
data = []
counts = random.randint(1, 30)
print('Generate {0} lines as data.'.format(counts))
for c in range(counts):
data.append('{0},"{1}","{2}"'.format(fake.name(), fake.address(), fake.text()))
tmp_file_name = name_pattern.format(i)
oss_bucket.put_object(tmp_file_name, "\r\n".join(data))
print('Upload testing data file: {0} successfully!'.format(tmp_file_name))
auth = oss2.Auth('yourAccessKeyId', 'yourAccessKeySecret')
bucket = oss2.Bucket(auth, 'yourEndpoint', 'bucket name')
send_testing_data_file_to_oss(bucket, 2000, 'data_file_{0}_historical_update.csv')
3-3. While running the Python script from Step 3-1, simultaneously execute the Python script from Step 3-2 to upload 2,000 small files to the source bucket.
3-4. After the upload is complete, review the output log generated by the Python script.
Test start on 2023-05-08 20:00:06
Get 0 files in noRTC bucket on 2023-05-08 20:00:06
Get 0 files in RTC bucket on 2023-05-08 20:00:07
……
Get 1923 files in noRTC bucket on 2023-05-08 20:03:00
Get 1996 files in RTC bucket on 2023-05-08 20:03:10
Get 2000 files in noRTC bucket on 2023-05-08 20:03:18
Get 2000 files in RTC bucket on 2023-05-08 20:03:26
3-5. Calculate the replication time of the small files.
・RTC-disabled replication time (small files) = Time when the destination bucket A first obtains 2,000 files - Execution start time
・RTC-enabled replication time (small files) = Time when the destination bucket B first obtains 2,000 files - Execution start time
Verification pattern | RTC-disabled (s) | RTC-enabled (s) |
2,000 small file duplication | 192 | 200 |
3-6. For multiple small files, the upload and replication complete at approximately the same time. Because the upload speed becomes a bottleneck, the above verification method may not be used to confirm the effect of the RTC feature.
3-7. Delete all files from the source bucket to perform the next verification. Because the replication policy created in Steps 2-4 and 2-6 for the replication jobs includes the Delete operation, it replicates the Delete operation to the destination buckets A and B, resulting in the deletion of all files there.
4-1. To measure the upload speed with a large file, create an ECS instance with high network performance in the same region as the source bucket.
Region: China (Hangzhou)
Spec: ecs.c7.3xlarge
Disk size: 240 GB
OS: CentOS 7.9
4-3. After the ECS instance is started, create a 200 GB dummy file by running the following command:
# fallocate -l 214748364800 testfile
4-4. Use ossutil, an official tool provided by Alibaba Cloud, to upload the large file. The tool package can be downloaded from the URL below and unzipped.
https://gosspublic.alicdn.com/ossutil/1.7.15/ossutil-v1.7.15-linux-amd64.zip
4-5. While running the Python script from Step 3-1, simultaneously execute the following command to upload the 200 GB file to the source bucket through the intranet:
# ./ossutil64 cp testfile oss://[bucketname]/ -r -f -e oss-cn-hangzhou-internal.aliyuncs.com -i [accessid] -k [accesskey]
4-6. On the details pages of the destination buckets A and B in the OSS console, you can check the parts of the file that is being uploaded. You can see that the replication started before the single file was uploaded to the source bucket regardless of RTC.
4-7. When the upload is complete, the processing time (approximately 345 seconds) and the average speed (approximately 600 MB/s) are displayed.
4-8. Review the output log generated by the Python script.
Test start on 2023-05-08 20:31:39
Get 0 files in noRTC bucket on 2023-05-08 20:31:40
Get 0 files in RTC bucket on 2023-05-08 20:31:40
……
Get 0 files in RTC bucket on 2023-05-08 20:37:50
Get 0 files in noRTC bucket on 2023-05-08 20:37:50
Get 1 files in RTC bucket on 2023-05-08 20:37:50
Get 0 files in noRTC bucket on 2023-05-08 20:37:50
…..
Get 0 files in noRTC bucket on 2023-05-08 20:39:35
Get 1 files in RTC bucket on 2023-05-08 20:39:35
Get 1 files in noRTC bucket on 2023-05-08 20:39:35
Get 1 files in RTC bucket on 2023-05-08 20:39:35
4-9. Calculate the replication time of the large file.
・RTC-disabled replication time (large file) = Time when the destination bucket A first obtains 1 file - Execution start time
・RTC-enabled replication time (large file) = Time when the destination bucket B first obtains 1 file - Execution start time
Verification pattern | RTC-disabled (s) | RTC-enabled (s) |
Large file (200 GB) duplication | 476 | 371 |
4-10. For a large file, a time lag exists between upload and replication completion, so the upload speed is not a bottleneck; therefore, the verification above successfully confirms the effect of RTC.
As of the publication date on May 12, 2023, Alibaba Cloud confirmed that, although the RTC feature was initially free of charge, they plan to introduce charges in the near future. Before usage, please check the official OSS price list page.
The new Replication Time Control feature in Alibaba Cloud OSS has been verified in this topic. While the effect of RTC could not be confirmed for multiple small files due to the upload speed becoming the bottleneck over the replication speed, a 105-second reduction with RTC duplication for a large file (200 GB) has been observed. If you have a need for low-latency cross-region replication, please try the RTC feature.
This article is a translated piece of work from SoftBank: https://www.softbank.jp/biz/blog/cloud-technology/articles/202305/rtc/
Disclaimer: The views expressed herein are for reference only and don't necessarily represent the official views of Alibaba Cloud.
Introducing Alibaba Cloud Quota Center and Permissions on API Operations
Introducing the New Full-stack Observability in Simple Log Service
Alibaba Clouder - April 2, 2020
ApsaraDB - September 11, 2024
Alibaba Cloud Storage - October 31, 2018
ApsaraDB - October 16, 2024
digoal - April 22, 2021
ApsaraDB - December 27, 2023
An encrypted and secure cloud storage service which stores, processes and accesses massive amounts of data from anywhere in the world
Learn MoreProvides scalable, distributed, and high-performance block storage and object storage services in a software-defined manner.
Learn MoreBuild a Data Lake with Alibaba Cloud Object Storage Service (OSS) with 99.9999999999% (12 9s) availability, 99.995% SLA, and high scalability
Learn MorePlan and optimize your storage budget with flexible storage services
Learn MoreMore Posts by H Ohara