All Products
Search
Document Center

Cloud Parallel File Storage:Dataflow tasks

Last Updated:Dec 17, 2024

Cloud Parallel File Storage (CPFS) for Lingjun supports batch and streaming dataflow tasks. You can select a task type based on your business requirements.

Batch task

A batch task allows you to import all the files of one directory to another directory at a time. This task type is suitable for preloading a dataset before a training task starts.

Streaming task

A streaming task allows you import files from one directory to another one by one. This task type is suitable for continuously reading and writing multiple checkpoint files during computing tasks for model training.

Note

Task description

Dataflow tasks are classified into the following types: Import, Export, StreamImport, and StreamExport based on the data operations.

Type

Description

Import

Imports data from a source Object Storage Service (OSS) bucket to a CPFS for Lingjun file system at a time.

  • The data blocks and metadata of an object can be imported.

  • The import path is the path of an object in the OSS bucket. A dataflow task imports data from the path of the object in the OSS bucket to the CPFS for Lingjun file system.

Export

Exports the specified data from a CPFS for Lingjun file system to an OSS bucket at a time.

The export path is the path of a file or directory in the CPFS for Lingjun file system. A dataflow task exports data from the path of the file or directory in the CPFS for Lingjun file system to the OSS bucket.

Warning
  • CPFS Intelligent Computing Edition exports the File Modification timestamps property to the custom metadata of the OSS Bucket, named x-oss-meta-alihbr-sync-mtime. It cannot be deleted or modified, otherwise the File Modification timestamps property in the file system will be incorrect.

  • When a dataflow is being used, do not disable versioning of the source OSS bucket. Otherwise, an error is reported when you run an export task. For more information, see Versioning.

StreamImport

Imports the specified objects from a source OSS bucket to a CPFS for Lingjun file system one by one. You can use StreamImport tasks only by calling API operations.

  • The data blocks and metadata of an object can be imported.

  • The import path is the path of an object in the OSS bucket. A dataflow task imports data from the path of the object in the OSS bucket to the CPFS for Lingjun file system.

StreamExport

Exports the specified files from a CPFS for Lingjun file system to an OSS bucket one by one. You can use StreamExport tasks only by calling API operations.

The export path is the path of a file or directory in the CPFS for Lingjun file system. A dataflow task exports data from the path of the file or directory in the CPFS for Lingjun file system to the OSS bucket.