To move data between a CPFS for Lingjun file system and an OSS bucket, create a data stream and a data stream task. This enables high-speed data transmission.
Features
CPFS for Lingjun supports the following data flow features:
Account-level data streams
You can move data between OSS buckets that are in the same account or in different accounts.
Directory-level data streams
You can create a data stream to map any subdirectory of a CPFS for Lingjun file system to any prefix in an OSS bucket. This provides more granular access control and flexible data transmission.
Data import and export
You can import and export data between a CPFS for Lingjun file system and OSS by creating batch or stream tasks. Batch tasks are suitable for preloading datasets before a computing task starts. Stream tasks are suitable for scenarios that require continuous write-back and preloading of multiple model checkpoint files during model training.
WarningCPFS for Lingjun exports the file modification timestamp property to the custom metadata of the OSS bucket. This property is named
x-oss-meta-alihbr-sync-mtime. Do not delete or modify this metadata. Otherwise, the file modification timestamp property in the file system will be incorrect.If a task fails, you can check the task report to determine the cause of the failure.
ImportantThe task report is for reference only. The data at the destination is the final source of truth after the data stream task is complete. You must verify the data consistency between the source and destination.
Limits
Data streams
CPFS for Lingjun 2.4.0 and later support data streams within the same account. CPFS for Lingjun 2.6.0 and later support data streams across different accounts.
A single CPFS for Lingjun file system supports a maximum of 10 data streams.
A file path in a CPFS for Lingjun file system can be linked to only one OSS bucket.
Data streams cannot be created between a CPFS for Lingjun file system and an OSS bucket in a different region.
Restrictions on paths, file names, and directory names for data streams
In a file system path associated with a data stream, do not rename non-empty directories. Otherwise, a
Permission Deniedor directory not empty error occurs.Use special characters in directory and file names with caution.
The following characters are supported: uppercase and lowercase letters, digits, exclamation points (!), hyphens (-), underscores (_), periods (.), asterisks (*), and parentheses (()).
The following special characters are not supported. Using them may cause your tasks to produce unexpected results or fail.
Files with a subdirectory or file name of two periods (..) are not supported.
Files with paths that contain backslashes (\) or consecutive backslashes (\\) are not supported.
Files with subdirectories or file names that contain forward slashes (/) are not supported.
Long paths are not supported. The maximum path length supported by data streams is 1023 characters.
Data stream task limits
Stream tasks are supported only in CPFS for Lingjun 2.6.0 and later. They can only be used through OpenAPI.
A single data stream can run a maximum of four batch tasks concurrently. There is no limit on the number of stream tasks.
Import limits
When a symlink file is imported to CPFS for Lingjun, it is converted into a regular file that contains data, and the symlink information is lost.
If an OSS bucket has multiple versions of an object, only the latest version is copied.
File names or subdirectory names longer than 255 bytes are not supported.
Export limitations
Symbolic link files become empty objects when synced to OSS. The files they point to are not synced.
Hard link files are synced to OSS as regular files.
Socket, Device, and Pipe files become empty objects when exported to an OSS bucket.
Directory paths longer than 1023 characters are not supported.
Performance metrics
Operation type | Metric | Description |
Import data | Throughput for files larger than 1 GB |
|
MB-level files processed per second | Single-directory and multi-directory import: 1,000. | |
Export data | Throughput for files larger than 1 GB |
|
MB-level files processed per second | Single-directory and multi-directory export: 1,200. |
Pricing
The data stream feature for CPFS for Lingjun is in public preview and is free of charge.
Procedure
Create a data stream.
For data streams within the same account, see Create a data stream for the same account.
For data streams across different accounts, see Create a data stream for a different account.
Create a batch or stream task.
For stream tasks, see Best practices for data stream tasks.
For batch tasks, see Manage data stream tasks.
Verify the data
After the data stream task is complete, verify the data at the destination to ensure its accuracy.
WarningIf you delete the source data before verifying that the data has been correctly transferred to the destination, you are responsible for any resulting data loss.