When you need to enable data flow between the CPFS Intelligent Computing Edition file system and the OSS Bucket, you must create a data flow and a data flow task to achieve high-speed data transmission.
Feature description
CPFS Intelligent Computing Edition supports the following data flow features:
Account-level data flow
Supports data flow between OSS Buckets within the same account or across different accounts.
Directory-level data flow
You can create a data flow to establish a mapping from any subdirectory in the CPFS Intelligent Computing Edition file system to any prefix in the OSS Bucket, enabling finer-grained access control and more flexible data transmission.
Data import and export
Supports data import and export between the CPFS Intelligent Computing Edition file system and OSS by creating batch tasks or stream tasks. Batch tasks are suitable for preloading datasets before computation tasks start; stream tasks are suitable for scenarios where multiple Checkpoint files of a model are continuously written back and preloaded during computation task training. If task execution fails, you can check the failure reason through the task report.
WarningCPFS Intelligent Computing Edition exports the File Modification timestamps property to the custom metadata of the OSS Bucket, named
x-oss-meta-alihbr-sync-mtime
. It cannot be deleted or modified, otherwise the File Modification timestamps property in the file system will be incorrect.
Limits
Data flow
CPFS Intelligent Computing Edition version 2.4.0 and above supports same-account data flow, and version 2.6.0 and above supports cross-account data flow.
A single CPFS Intelligent Computing Edition file system supports up to 10 data flows.
A file path in the CPFS Intelligent Computing Edition file system can only be linked to one OSS Bucket.
CPFS Intelligent Computing Edition file system does not support creating data flows with OSS Buckets in other regions.
Limits of data flows on file systems
Renaming operations cannot be performed on non-empty directories in file system paths associated with data flows, otherwise an error
Permission Denied
or directory not empty will occur.Special characters in directory and file names should be used with caution. Only uppercase and lowercase letters, numbers, exclamation marks (!), hyphens (-), underscores (_), half-width periods (.), asterisks (*), and half-width parentheses (()) are supported. Double half-width periods (..), backslashes (\), and forward slashes (/) are not supported.
Overly long paths are not supported. The maximum path length supported by data flow is 1023 characters.
Data flow task limits
Only CPFS Intelligent Computing Edition version 2.6.0 and above supports stream tasks, and they can only be used through OpenAPI.
A maximum of 4 batch tasks can run simultaneously under a single data flow, with no limit on stream tasks.
Import limits
Files of the Symlink type will be converted to regular files containing data when imported into CPFS Intelligent Computing Edition, losing the Symlink information.
If multiple versions exist in the OSS Bucket, only the latest version is copied.
File names or subdirectory names longer than 255 bytes are not supported.
Export limits
Files of the Symlink type will not synchronize the files they point to when synced to OSS, but will become ordinary blank objects without data.
Files of the Hardlink type are synchronized to OSS as regular files.
Files of the Socket, Device, and Pipe types will become ordinary blank objects without data when exported to the OSS Bucket.
Directory paths longer than 1023 characters are not supported.
Usage flow
Create a data flow.
For specific operations, see Create same-account data flow or Create cross-account data flow.
Create batch tasks or stream tasks.
For specific operations, see Manage data flow tasks or Best practices for data flow stream tasks.
Performance metrics
Operation type | Metric | Description |
Data import | Throughput for files larger than GB |
|
Number of MB-level files processed per second | Single directory, multiple directory import: 1000. | |
Data export | Throughput for files larger than GB |
|
Number of MB-level files processed per second | Single directory, multiple directory export: 1200. |
Billing examples
The data flow feature of CPFS Intelligent Computing Edition is currently in public preview and is free to use.