Dataflow task types - Cloud Parallel File Storage - Alibaba Cloud Documentation Center

Cloud Parallel File Storage (CPFS) for Lingjun supports batch and streaming dataflow tasks. You can select a task type based on your business requirements.

Batch task

A batch task allows you to import all the files of one directory to another directory at a time. This task type is suitable for preloading a dataset before a training task starts.

Streaming task

A streaming task allows you import files from one directory to another one by one. This task type is suitable for continuously reading and writing multiple checkpoint files during computing tasks for model training.

Note

Only CPFS for Lingjun V2.6.0 and later support streaming tasks.
You can use streaming tasks only by calling API operations. For more information, see Best practice of streaming dataflow tasks.

Task description

Dataflow tasks are classified into the following types: Import, Export, StreamImport, and StreamExport based on the data operations.

Type	Description
Import	Imports data from a source Object Storage Service (OSS) bucket to a CPFS for Lingjun file system at a time. The data blocks and metadata of an object can be imported. The import path is the path of an object in the OSS bucket. A dataflow task imports data from the path of the object in the OSS bucket to the CPFS for Lingjun file system.
Export	Exports the specified data from a CPFS for Lingjun file system to an OSS bucket at a time. The export path is the path of a file or directory in the CPFS for Lingjun file system. A dataflow task exports data from the path of the file or directory in the CPFS for Lingjun file system to the OSS bucket. Warning CPFS Intelligent Computing Edition exports the File Modification timestamps property to the custom metadata of the OSS Bucket, named `x-oss-meta-alihbr-sync-mtime`. It cannot be deleted or modified, otherwise the File Modification timestamps property in the file system will be incorrect. When a dataflow is being used, do not disable versioning of the source OSS bucket. Otherwise, an error is reported when you run an export task. For more information, see Versioning.
StreamImport	Imports the specified objects from a source OSS bucket to a CPFS for Lingjun file system one by one. You can use StreamImport tasks only by calling API operations. The data blocks and metadata of an object can be imported. The import path is the path of an object in the OSS bucket. A dataflow task imports data from the path of the object in the OSS bucket to the CPFS for Lingjun file system.
StreamExport	Exports the specified files from a CPFS for Lingjun file system to an OSS bucket one by one. You can use StreamExport tasks only by calling API operations. The export path is the path of a file or directory in the CPFS for Lingjun file system. A dataflow task exports data from the path of the file or directory in the CPFS for Lingjun file system to the OSS bucket.