With the development of business, such as AI, data warehousing, and big data analysis, an increasing number of business that runs on Object Storage Service (OSS) requires low data access latency and high throughput. OSS provides the OSS accelerator feature to allow you to create an accelerator to cache hot objects on high-performance NVMe SSDs to provide data access with low latency within milliseconds and high throughput.
Metric description
Metric | Description |
Peak read bandwidth | Formula: MAX[600,600 × Cache capacity (TB)] MB/s
For example, if an accelerator provides a cache capacity of 2,048 GB (2 TB), the read bandwidth is (600 + 600)= 1200 MB/s. |
Maximum read bandwidth | 40 GB/s If your business requires a greater read bandwidth, submit a ticket. |
Minimum latency for reading 128 KB in a single request | <10 ms |
Cache capacity |
If your business requires a greater cache capacity, submit a ticket. |
Scale-up or scale-down interval | Once per hour |
Scale-up or scale-down method | Manually scale up or scale down in the OSS console |
Cache deletion policy | The cache is deleted based on the Least Recently Used (LRU) algorithm. The LRU cache algorithm is used to ensure that frequently accessed data is retained, and data that is not accessed for a long period of time is preferentially deleted. In this case, the cache capacity is efficiently used. |
Prerequisites
Your bucket is located in one of the following regions in which the OSS accelerator feature is in public preview: China (Hangzhou), China (Shanghai), China (Beijing), China (Ulanqab), China (Shenzhen), and Singapore.
Usage notes
You can use the accelerated endpoint of an accelerator to access resources in the accelerator only over the internal network. If you want to cache OSS data that is accessed over the Internet, we recommend that you use Alibaba Cloud CDN.
The cached data on the OSS accelerator is a single copy of the objects in the bucket. If the cache hardware fails, data must be prefetched again from OSS. The access speed decreases before the prefetch is complete.
Billing rules
The OSS accelerator feature is in public preview. During the public preview, up to 100 GB of cache capacity is provided free of charge. After the public preview ends, you are charged for the actual cache capacity of the OSS accelerator on a pay-as-you-go basis.
When you use the accelerated endpoint of the accelerator to read and write OSS data, you are charged OSS API operation calling fees even if origin fetch requests are not sent.
Scenarios
The OSS accelerator feature is suitable for scenarios in which high bandwidth is required and data is repeatedly read.
Model inference
Background information
The inference server needs to pull and load model objects for AI-generated content (AIGC) model inference. During inference and debugging, the inference server also needs to constantly switch between new model objects. As the size of model objects increases, a longer period of time is required to allow the inference server to pull and load the model objects.
Solution
Use the asynchronous warmup policy or the warmup during read policy. The asynchronous warmup policy of OSS is suitable for scenarios in which you can determine the list of hot model objects. The warmup during read policy is suitable for scenarios in which you cannot determine the list of hot model objects. If you can determine the list of hot model objects, you can configure an accelerator of a specific cache capacity and use the accelerator SDK to store the objects in the accelerator in advance. You can also configure an accelerator of a specific cache capacity based on previous experience. The accelerator automatically caches model objects to the high-performance media of the accelerator when data is read for quick access in subsequent reads. The cache capacity of the accelerator can be scaled at any time based on your acceleration requirements. If your inference server needs to access OSS from a local directory, you must deploy ossfs.
Low-latency data sharing
Background information
When a customer purchases goods from a vending machine, the customer uses a mobile app to scan the goods in the container, take a picture, and upload the picture. After the application backend receives the picture, the OSS accelerator stores the picture. The background subsystem then carries out content moderation analysis and barcode recognition on the picture, and the results of barcode recognition are returned to the application backend for fee deduction and other operations. The picture must be downloaded within milliseconds.
Solution
Use the synchronous warmup policy of the accelerator. The OSS accelerator can effectively reduce the latency of loading pictures to the analysis system and shorten transaction links. The OSS accelerator is suitable for business that is sensitive to latency and repeatedly reads data.
Big data analysis
Background information
The company's business data is partitioned by day and stored in OSS for a long period of time. Analysts use computing engines, such as Hive or Spark, to analyze data, but they are not sure about the query range. The analysts are required to reduce the amount of time that is required for query and analysis.
Solution
Use the warmup during read policy of the OSS accelerator. This policy is suitable for offline query scenarios in which a large amount of data is stored, the data query range is uncertain, and the data cannot be accurately warmed up. For example, the data queried by Analyst A is cached in an acceleration cluster. If the data queried by Analyst B contains the data queried by Analyst A, data analysis is accelerated.
Multi-level acceleration
Background information
No conflict exists between client-side caching and server-side acceleration. You want to achieve multi-level acceleration based on your business requirements.
Solution
Use the OSS accelerator together with the client-side cache. We recommend that you deploy Alluxio and computing clusters. If the data that you want to read does not match data in the Alluxio cache, the data is read from the backend storage. The OSS accelerator uses the warmup during read policy and warms up data the first time the data is read. Time to live (TTL) is configured for each object and directory in Alluxio due to the limits of the cache capacity of the client host. When the TTL period ends, the cache is deleted to save space. In this case, data in the OSS accelerator is not immediately deleted, and its cache capacity can store hundreds of TB of data. When data that does not match data in the Alluxio cache is read again, the data can be directly loaded from the OSS accelerator to implement two-level acceleration.
Benefits
Low latency
The NVMe SSD media of the OSS accelerator can provide millisecond-level download latency for business, which is especially suitable for large object download scenarios. The OSS accelerator provides better performance for hot data query in data warehouses and inference model download.
Increased throughput
The bandwidth of an accelerator increases linearly together with the cache capacity of the accelerator and provides burst throughput of up to hundreds of GB/s.
Automatic scaling
In most cases, computing tasks are periodic tasks that have different requirements for the amount of required resources. You can scale up or scale down the cache capacity of the OSS accelerator based on your requirements without interrupting your business. This helps reduce resource waste and costs. The accelerator supports at least 50 GB of cache capacity and up to 100 TB of cache capacity. The OSS accelerator inherits the advantages of OSS massive data storage and can directly cache multiple tables or partitions in a data warehouse.
High throughput
The OSS accelerator can provide high throughput for a small amount of data and meet the burst read requirements for a small amount of hot data.
Decoupled storage and computing
Compared with the cache capacity of the computing server, the OSS accelerator can be independent of the computing server and you can adjust the cache capacity and performance of the OSS accelerator online without interrupting your business.
Data consistency
Compared with conventional cache solutions, the OSS accelerator feature ensures data consistency. When you update objects in OSS buckets, the accelerator automatically identifies and caches the latest versions of the objects to ensure that the computing engines can read the latest versions of the objects.
Multiple warmup policies
The OSS accelerator can automatically identify objects that are updated on OSS to ensure that the engine can read the latest data. The OSS accelerator provides the following warmup policies:
Synchronous warmup: When data is written to OSS, data is synchronized and cached on the accelerator.
Warmup during read: If data that you request does not hit the cache, the OSS accelerator automatically caches the data to the OSS accelerator.
Asynchronous warmup: You can run commands to batch cache data in OSS to the OSS accelerator.
How it works
After an accelerator is created, it has an internal accelerated endpoint that is dedicated to the region. For example, the accelerated endpoint for the China (Beijing) region is http://cn-beijing-internal.oss-data-acc.aliyuncs.com
. If you are located in the same virtual private cloud (VPC) as the accelerator, you can use the accelerated endpoint to access the resources that are cached on the accelerator. The following figure shows how the accelerated endpoint is used to access the resources that are cached on the accelerator.
The following items describe the workflow:
Write requests
If synchronous cache warmup is disabled, write requests that are sent from a client to the accelerated endpoint of the accelerator are forwarded to OSS buckets. This process is similar to the process in which the default domain names of OSS buckets are used.
If synchronous cache warmup is enabled, write requests that are sent from a client to the accelerated endpoint are forwarded to OSS buckets and the OSS accelerator.
Read requests
Read requests that are sent from a client to the accelerated endpoint are forwarded to the OSS accelerator.
When the OSS accelerator receives the read requests, the OSS accelerator searches for the requested objects in the cache.
If the requested objects are cached on the accelerator, the objects are returned to the client.
If the requested objects are not cached on the accelerator, the accelerator requests the objects from the OSS buckets that are mapped to the accelerator. After OSS receives the requests, OSS caches the requested objects in the accelerator. Then, the accelerator returns the objects to the client.
If the cache capacity of the OSS accelerator is exhausted, the OSS accelerator prioritizes the cached objects that are accessed with relatively high frequency.
Performance comparison
Download response latency statistics
Use OSS and an OSS accelerator to download objects that are 10 MB in size multiple times for testing, and calculate the response latency in milliseconds. The results show that the latency is reduced by 10 times when an OSS accelerator is used.
In the following figure, P50 indicates that 50% of requests meet the current latency statistics, and P999 indicates that 99.9% of requests meet the current latency statistics.
Data lakes and data warehouses in the cloud
A user tests a local disk, OSS, and an OSS accelerator as storage media.
Latency
Scenario
Local CacheFS (local disk)
OSS
OSS cache (accelerator)
Point queries
382 ms
2451 ms
1160 ms
Random queries on 1,000 data items
438 ms
3786 ms
1536 ms
Random queries on 10% of data
130564 ms
345707 ms
134659 ms
Full scan
171548 ms
398681 ms
197134 ms
Performance
During online queries, the response time of the OSS accelerator is 2 to 2.5 times higher than the response time of OSS. During full scan and random queries on 10% of data, the performance of the OSS accelerator is 2 to 2.5 times higher than the performance of OSS and 85% of the performance of local ESSD CacheFS.
During online queries, the fixed latency of a single request to the OSS accelerator is 8 to 10 ms. During random queries on 1,000 data items and point queries, the performance of the OSS accelerator is 1.5 to 3 times higher than the performance of OSS and 30% of the performance of local ESSD CacheFS.
Simulation training for containers and autonomous driving
A large number of containers are started at the same time to obtain images, maps, and log data. The overall duration of simulation training is reduced by 60%.
Type
Data volume
Peak bandwidth
Duration
OSS
204 TB
100 Gbit/s
2.2 hours
OSS accelerator
128 TB
300 Gbit/s
40 minutes
Accelerator throughput
The accelerator provides throughput bandwidth for data cached on the accelerator based on the configured cache capacity. The accelerator provides a throughput of up to 2.4 Gbit/s for 1 TB of cache capacity of the accelerator. The throughput provided by the accelerator is not limited by the standard throughput provided by OSS. For more information about the standard bandwidth limits of OSS, see Limits and performance metrics.
For example, OSS provides a standard bandwidth of 100 Gbit/s in the China (Shenzhen) region. After you enable the OSS accelerator feature and create an OSS accelerator that has a cache capacity of 10 TB, you can obtain an additional 24 Gbit/s low-latency throughput provided by the accelerator if you use the accelerated endpoint of the accelerator to access data cached on the accelerator. For batch offline computing applications, you can take advantage of the 100 Gbit/s standard throughput for large-scale concurrent block read if you use an OSS internal endpoint. For hot data query service, you can obtain an additional 24 Gbit/s low-latency throughput if you use the accelerated endpoint of the accelerator to access data cached on the NVMe SSD media of the accelerator.
Create an accelerator.
Create an accelerator.
Log on to the OSS console.
In the left-side navigation pane, click Buckets. On the Buckets page, find and click the desired bucket.
In the left-side navigation tree, choose
.On the OSS Accelerator page, click Settings.
In the Note dialog box, select I have read the Trial Terms of Service and click Next.
In the Create Accelerator panel, configure the Capacity parameter and click Next.
Configure a policy for the accelerator.
In the Create Accelerator panel, configure the following parameters.
Parameter
Description
Acceleration Policy
Select one of the following policies for the accelerator:
Paths: Access to objects in the paths is accelerated. You can configure up to 10 paths. Only access to objects in the configured paths is accelerated. For example, if you want to accelerate access to objects in the example directory in the root directory, set the path to example/.
Entire Bucket: Access to all objects in the bucket is accelerated.
Synchronous Cache Warmup
We recommend that you enable synchronous cache warmup. After you enable it, when the client uses the accelerator endpoint of the accelerator to write data to OSS by calling PutObject or AppendObject, data is written to the OSS bucket and the accelerator at the same time. You can read data next time by using the accelerator with low latency.
Click Create. In the Confirm Billable Items message box, click OK.
Use the accelerator
The following section describes how to use an accelerator to accelerate access to all data in a bucket or data in specific paths.
Modify the cache capacity of an accelerator
You can scale up or scale down the cache capacity of an accelerator by performing the following steps:
In the Basic Information section of the OSS Accelerator page, click the icon in the upper-right corner.
In the Edit OSS Accelerator panel, change the value of the Capacity parameter.
Click OK.