An FTP Check node can be created in DataStudio to periodically detect whether a specific file exists based on FTP. If the FTP Check node detects that the file exists, the scheduling system runs the descendant node of the FTP Check node. Otherwise, the FTP Check node detects the file based on the configured detection interval. The FTP Check node stops the retry until the condition to stop the detection is met. In most cases, FTP Check nodes are used for communications between the DataWorks scheduling system and external scheduling systems. This topic describes how to use an FTP Check node and the related precautions.
Prerequisites
An FTP data source is added.
A workflow is created. For more information, see Create an auto triggered workflow.
Background information
An FTP Check node is typically used in the following scenario: A task in the DataWorks scheduling system needs to access an external database in an external scheduling system, but an ongoing data write task for the database is not performed by DataWorks. In this case, the time when the data write task is complete and the time when the database can be accessed are unknown to DataWorks. If the task accesses the database, the data that is read from the database may be incomplete or the data read fails because the data write task is not complete. To ensure that the task can successfully read data from the external database, you can enable the external scheduling system to generate a mark that indicates the data write task is complete. For example, you can enable the external scheduling system to generate a marker file with the suffix .done
in the file system to indicate that the data write task is complete. Then, you can create an FTP Check node in the DataWorks scheduling system to periodically detect whether the marker file with the suffix .done
exists. If the file exists, the node that needs to access the external database can be scheduled.
You can specify the file system that can be used to store the marker files.
In this example, a marker file with the suffix
.done
is used. You can customize the information such as the format and name for your marker file.
The following figure and descriptions provide detailed information.
After a data write task for an external database in an external scheduling system is complete and the database can be accessed, the scheduling system generates a marker file, such as
xxxx2021-03-03.done
, in the specified file system. In this example, a marker file with the suffix.done
is used. You can customize the information for your marker file based on your business requirements.An FTP data source reads the marker file in the file system.
The FTP Check node periodically detects whether the marker file exists in the FTP data source based on the configured detection policy.
If the FTP Check node detects that the marker file exists, the data write task for the external database is complete, and the database can be accessed. Then, the FTP Check node returns the detection result to its descendant node.
If the FTP Check node detects that the marker file does not exist, the data write task for the external database is not complete, and the database cannot be accessed. In this case, the FTP Check node returns the detection result and the information that its descendant node will not be scheduled to the descendant node. Then, the FTP Check node continues the detection based on the configured detection policy until the specified condition for stopping the detection is met.
For more information about how to configure a detection policy for an FTP Check node, see Step 6 in the Create an FTP Check node section of this topic.
The descendant node of the FTP Check node determines whether to access the external database based on the detection result returned by the FTP Check node.
If the FTP Check node detects that the marker file exists, the descendant node accesses the external database.
If the FTP Check node detects that the marker file does not exist, the descendant node does not access the external database.
The descendant node accesses the external database.
External databases include but are not limited to Oracle, MySQL, and SQL Server.
Limits
Tasks on FTP Check nodes can be run on serverless resource groups or old-version exclusive resource groups for scheduling. We recommend that you run tasks on serverless resource groups. For more information about how to purchase a serverless resource group, see Create and use a serverless resource group.
If an FTP Check node is scheduled by minute or hour, you can set the Policy for Stopping Check parameter only to Checks Allowed Before Check Node Stops for the node.
This type of node is supported in the following regions: China (Hangzhou), China (Shanghai), China (Beijing), China (Shenzhen), China (Chengdu), China (Hong Kong), Japan (Tokyo), Singapore, Malaysia (Kuala Lumpur), Indonesia (Jakarta), Germany (Frankfurt), UK (London), US (Silicon Valley), and US (Virginia).
Create an FTP Check node
Log on to the DataWorks console.
On the DataStudio page, move the pointer over the icon and choose .
Alternatively, you can find the desired workflow, right-click the workflow name, and then choose
.In the Create Node dialog box, specify the Name and Path parameters.
NoteThe node name must be 1 to 128 characters in length and can contain letters, digits, underscores (_), and periods (.).
- Click Confirm.
Click the Properties tab in the right-side navigation pane and configure scheduling properties for the node.
The properties include basic properties, time properties, resource properties, and scheduling dependencies. For more information, see Configure basic properties, Configure time properties, Configure the resource property, and Configure same-cycle scheduling dependencies.
Configure a detection object and a detection policy.
Select the FTP data source that you want to detect from the Select FTP Data Source drop-down list.
You can select an FTP or SFTP data source. If no data source is available, you must add one. For more information, see Add an FTP data source.
Specify the path of the marker file in the File to Check field. If the file path that you specified is dynamic, you can use scheduling parameters to configure variable paths in the file path. For more information, see Supported formats of scheduling parameters.
Specify an interval at which the detection is performed in the Check Interval (Seconds) field.
Select a policy for Policy for Stopping Check. The following policies are available:
Time for Stopping Check: the point in time when the detection stops. Specify this parameter in the
hh24:mi:ss
format. The time format is based on the 24-hour clock. If no marker file is detected each time the FTP Check node is run, the detection fails. In this case, the system does not schedule the descendant node of the FTP Check node. The system starts to schedule the descendant node only after the detection succeeds. If the previous detection fails, the node continues the detection based on the configured detection interval and stops the detection until the time that you specified for stopping the detection is reached. You can view the node logs to find the detailed cause of the failure.NoteThe scheduling cycle of the FTP Check node affects the stop policy of the node.
If the FTP Check node is scheduled by minute or hour, the Policy for Stopping Check parameter can be set only to Checks Allowed Before Check Node Stops for the node. For more information, see Configure a detection policy for an FTP Check node.
If you want to change the scheduling cycle of the FTP Check node for which Policy for Stopping Check is set to Time for Stopping Check from day to minute or hour, the Time for Stopping Check policy becomes invalid. In this case, you must set Policy for Stopping Check to Checks Allowed Before Check Node Stops. Otherwise, the FTP Check node cannot be committed.
Checks Allowed Before Check Node Stops: the maximum number of times that the detection can be performed. If no marker file is detected each time the FTP Check node is run, the detection fails. In this case, the system does not schedule the descendant node of the FTP Check node. The system starts to schedule the descendant node only after the detection succeeds. If the detection fails, the node continues the detection based on the specified detection interval and stops the detection when the maximum number of times that the detection can be performed is reached. You can view the node logs to find the detailed cause of the failure.
Save and commit the node.
NoteYou must configure the Rerun and Parent Nodes parameters on the Properties tab before you commit the task.
Click the icon in the top toolbar to save the node.
In the top toolbar of the configuration tab of the real-time synchronization node, click the icon to save the node.
In the Submit dialog box, enter your comments in the Change description field.
Click OK.
If the workspace that you use is in standard mode, you must click Deploy in the upper-right corner of the configuration tab to deploy the node after you commit it. For more information, see Perform basic O&M operations on auto triggered nodes.
- Perform O&M operations on the node. For more information, see Perform basic O&M operations on auto triggered nodes.