Manage CPFS for Lingjun data flow tasks - Cloud Parallel File Storage

This topic describes how to create and manage the dataflow tasks of Cloud Parallel File Storage (CPFS) for Lingjun file systems and view the causes of task failures in the File Storage NAS (NAS) console.

Background information

The dataflow tasks that you create in the NAS console are batch tasks. A batch dataflow task is used to import or export all files from one directory to another directory at a time. You cannot use a batch dataflow task to import or export files one by one. If you need to import or export files one by one, use a streaming dataflow task by calling API operations. For more information, see Best practice of streaming dataflow tasks.

Prerequisites

A dataflow is created. For more information, see the Create a dataflow within the same account or Create a dataflow across accounts section of the "Manage dataflows" topic.
Versioning is enabled for the source Object Storage Service (OSS) bucket that is associated with your CPFS for Lingjun file system if you create a dataflow task to export data. Do not disable versioning when you use the dataflow feature. Otherwise, an error is reported when you run a dataflow task to export data. For more information, see Versioning.

Create tasks

Log on to the NAS console.
In the left-side navigation pane, choose File System > File System List.
In the top navigation bar, select a region.
On the File System List page, click the name of the CPFS for Lingjun file system that you want to manage.
On the details page of the file system, click Dataflow in the left-side pane.
On the Dataflow page, find the dataflow that you want to manage and click Task Management in the Actions column.
In the Task Management panel, click Create Job.

In the Create Job panel, create different types of tasks and configure the tasks.

Import data

After a symbolic link is imported to CPFS for Lingjun, the symbolic link is converted into a regular data file that contains no symbolic link information.
If an OSS bucket contains data of multiple versions, only data of the latest version is imported.
The name of a file or a subdirectory can be up to 255 bytes in length.
The name of a file or a directory cannot contain the following special characters. Otherwise, an unexpected result may occur or a task may fail.
- The name of a subdirectory or a file connot contain double period (..).
- The path of a subdirectory or a file cannot contain backslash (\) and consecutive backslash (\\).
- The name of a subdirectory or a file connot contain forward slash (/).
If a file and a subdirectory have the same name, an object conflict occurs in the CPFS for Lingjun file system. In this case, only one object with the name can be imported.

Parameter	Description
Data Type	The type of the data to be imported. Set the value to Data + Metadata. This value specifies that both the data blocks and metadata of an object are imported.
Specify OSS Object Prefix Subdirectory	The directory or list of files whose data you want to import. Select Import Objects from OSS. You must specify a relative path with the specified OSS object prefix. The OSS path that you specify must start and end with a forward slash (/). Note If the CPFS file system path that you specify for a dataflow does not exist, you can select If the CPFS directory you created does not exist, the system automatically creates a CPFS directory to prevent data import failures.
Conflict Resolution Policy	The handling policy when there are files with the same name in the CPFS for Lingjun file system and the OSS bucket. Skip Files With The Same Name (default): Ignore files with the same name and do not sync them. Keep The Latest: Compare the update time (mtime) of files with the same name and keep the updated version. OSS uses the modification time, and CPFS uses the modification time. Overwrite Files With The Same Name: Overwrite files with the same name with the OSS version. Select Overwrite The Existing Files With The Same Name On The Target End With The Current Source Files. Please Ensure You Have Backed Up Important Data.

Export data

Make sure that versioning is enabled for the source OSS bucket that is associated with your CPFS for Lingjun file system. Do not disable versioning when you use the dataflow feature. Otherwise, an error is reported when you run a dataflow task to export data. For more information, see Versioning.
After a symbolic link is synchronized to OSS, the file to which the symbolic link points is not synchronized to OSS. In this case, the symbolic link is converted into a regular object that contains no data.
Hard links can be synchronized to OSS only as regular files that contain no link information.
The files of the Socket, Device, or Pipe type cannot be exported to an OSS bucket.
The path of a directory can be up to 1,023 characters in length.
The name of a file or a directory cannot contain the following special characters. Otherwise, an unexpected result may occur or a task may fail.
- The name of a subdirectory or a file connot contain double period (..).
- The path of a subdirectory or a file cannot contain backslash (\) and consecutive backslash (\\).
- The name of a subdirectory or a file connot contain forward slash (/).
CPFS for Lingjun exports the File Modification timestamps attribute to the custom metadata of an OSS bucket. The metadata field is named x-oss-meta-alihbr-sync-mtime and cannot be deleted or modified. Otherwise, an error occurs when you access the File Modification timestamps attribute of the file system.

Parameter	Description
Export Data Type	The type of the data to be exported. Select Data + Metadata. This value specifies that both the data blocks and metadata of a file are exported.
Specify CPFS Subdirectory	The directory or list of files whose data you want to export. Select Export Files from CPFS. You must specify a directory in the specified CPFS directory. The directory that you specify must start and end with a forward slash (/).
Conflict Resolution Policy	The policy used when the CPFS for Lingjun file system and the OSS bucket have objects with the same name. Valid values: Skip Files with the Same Name (Default): ignores the objects with the same name and does not synchronize these objects. Keep the Latest File: compares the update time (mtime) of the objects with the same name and keeps the latest object. Both OSS and CPFS for Lingjun use the modification time for the comparison. Overwrite Files with the Same Name: replaces the object with the same name in the OSS bucket with the source file in the CPFS for Lingjun file system. Select Use the source file to overwrite the existing file with the same name on the destination. Make sure that you have backed up key data.

Click OK.

Cancel tasks

You can cancel a running dataflow task in the console.

On the Dataflow page, find the dataflow that you want to manage and click Task Management in the Actions column.
In the Task Management panel, find the task that you want to cancel, and click Cancel.
In the message that appears, click OK.

Copy tasks

You can copy a dataflow task that is run to run the task again.

On the Dataflow page, find the dataflow that you want to manage and click Task Management in the Actions column.
In the Task Management panel, find the task that you want to copy, move the pointer over the icon in the Actions column, and then select Copy.
In the message that appears, click OK.

View the cause of a task failure

If a data flow task fails, the system displays the failure cause or generates a task report about the failure. You can view the failure cause or download the task report in the NAS console and troubleshoot the issue.

On the Dataflow page, find the dataflow that you want to manage and click Task Management in the Actions column.
In the Task Management panel, find the failed task and move the pointer over the icon next to Failed in the Status column to view the failure cause or download the task report.
Note
If no failure cause is displayed, no task report is generated, or you cannot troubleshoot the issue based on the failure cause or task report, submit a ticket for troubleshooting.

View task configuration information and running status

You can view the configuration information and running status of a batch task in the console. If you want to view the configuration information and running status of a streaming task, you can query by calling the DescribeDataFlowTasks API.

On the Dataflow page, find the dataflow that you want to manage and click Task Management in the Actions column.

In the Task Management panel, view the configuration information and running status of the data flow task.

Parameter	Description
Task ID	The unique identifier of the data flow task.
Type	The types of the data flow task, including import data or export data.
Conflict Resolution Policy	The policy used when the CPFS for Lingjun file system and the OSS bucket have objects with the same name. Valid values: Skip Files with the Same Name (Default) Keep the Latest File Overwrite Files with the Same Name
Data source address	A complete path of the data transmitted from the source to the destination in the data flow task.
Data destination address
Data source directory
Total amount of scanned data on source	The amount of data scanned on the source. Unit: bytes.
Amount of synchronized data sources	The amount of data (including skipped data) for which the data flow task is complete. Unit: bytes.
Actual amount of transmitted data	The actual amount of data transmitted in the data flow task. Unit: bytes.
Average speed	The average speed of data transmitted in the data flow task. Unit: Byte/s.
Remaining duration	The estimated time required to complete the data flow task based on the current speed.
Time period	The period of time from the start time to the end time of the data flow task.
Progress	The execution progress in percentage of the current data flow task. Unit: %.
Status	The execution status of the current data flow task. Valid values: Pending: The data flow task has been created and has not started. Executing: The data flow task is being executed. Failed: The data flow task failed to be executed. Canceled: The data flow task was canceled and not completed. Canceling: The data flow task is being canceled. Completed: The data flow task is completed.

View task reports

After the data flow task is completed, the system generates a Skipped File Report, Failed File Report, or Successful File Report based on the actual scenario. You can download the report from the console and view the details of the task.

On the Dataflow page, find the dataflow that you want to manage and click Task Management in the Actions column.
In the Task Management panel, find the completed task, and click Download Task Report.
Confirm the report that you want to download and click the .

View performance monitoring or configure alert rules

To view the performance monitoring or configure alert rules of a task, make sure that you use CPFS for Lingjun file system V2.6.0 or later, and a data flow task is created.

If you want to learn the performance details of import data or export data of a data flow task, such as read/write throughput, read/write IOPS, and metadata QPS, see View the performance monitoring data of a CPFS file system.
If you want to configure alert rules for specified metrics of a data flow task to help you identify exceptions and handle the exceptions at the earliest opportunity, see Configure a basic alert rule.