Cloud Parallel File Storage (CPFS) for Lingjun supports batch and streaming dataflow tasks. You can select a task type based on your business requirements.
Batch task
A batch task allows you to import all the files of one directory to another directory at a time. This task type is suitable for preloading a dataset before a training task starts.
Streaming task
A streaming task allows you import files from one directory to another one by one. This task type is suitable for continuously reading and writing multiple checkpoint files during computing tasks for model training.
Only CPFS for Lingjun V2.6.0 and later support streaming tasks.
You can use streaming tasks only by calling API operations. For more information, see Best practice of streaming dataflow tasks.
Task description
Dataflow tasks are classified into the following types: Import, Export, StreamImport, and StreamExport based on the data operations.
Type | Description |
Import | Imports data from a source Object Storage Service (OSS) bucket to a CPFS for Lingjun file system at a time.
|
Export | Exports the specified data from a CPFS for Lingjun file system to an OSS bucket at a time. The export path is the path of a file or directory in the CPFS for Lingjun file system. A dataflow task exports data from the path of the file or directory in the CPFS for Lingjun file system to the OSS bucket. Warning
|
StreamImport | Imports the specified objects from a source OSS bucket to a CPFS for Lingjun file system one by one. You can use StreamImport tasks only by calling API operations.
|
StreamExport | Exports the specified files from a CPFS for Lingjun file system to an OSS bucket one by one. You can use StreamExport tasks only by calling API operations. The export path is the path of a file or directory in the CPFS for Lingjun file system. A dataflow task exports data from the path of the file or directory in the CPFS for Lingjun file system to the OSS bucket. |