Step | Description | References |
Step 1: Create a workflow | Data development in DataWorks is performed based on workflows and code. Before you perform development operations, you must create a workflow. | Create a workflow |
Step 2: Create a table | DataWorks allows you to create tables in the DataWorks console and displays tables in a directory structure. You can manage the tables in the DataWorks console. Before you develop data in your workspace, you must create tables to store raw data and tables to receive data cleansing results in the compute engines that are associated with your workspace. You can determine which types of tables are required based on the compute engines that you use. | |
Step 3: (Optional) Create and upload resources | DataWorks allows you to upload different types of resources such as text files and JAR packages to the specified compute engines and allows you to use the resources when you develop data. If you need to use some existing resources for data development, you can upload the resources by performing operations in the DataWorks console and then manage the resources in the console. Note You can view the compute engines for which you can create resources and the types of resources that are supported by compute engines in the DataWorks console. | |
Step 4: Create a scheduling node | Data development in DataWorks is based on nodes, and tasks of different types of compute engines are encapsulated into different types of nodes in DataWorks. You can select a node type to develop nodes based on your business requirements. You can also perform management operations on nodes with ease. For example, you can use a node group to clone multiple nodes at a time. You can quickly restore deleted nodes from the recycle bin. | DataWorks supports the following types of compute engines: You can select different types of nodes for tasks of different types of compute engines. For information about different types of DataWorks nodes, see DataWorks nodes. For information about the management operations that you can perform on nodes, see the following topics: |
Step 5: (Optional) Reference resources in nodes | Before you can use resources in a DataWorks node, you must load the resources to the development environment of the node. | |
Step 6: (Optional) Register a function | Before you can use a function to develop data, you must register the function in the DataWorks console. Before you register a function, you must upload the resources that are required by the function to DataWorks. Note You can view the compute engines for which you can register functions in the DataWorks console. | |
Step 7: Write the node code | You can write code for a node that corresponds to a task of a specific compute engine type on the node configuration tab based on the syntax that is supported by the compute engine and the related database. The syntax based on which you write the node code varies based on the node type. Note After you write the code, click the icon to save the code at the earliest opportunity to prevent code loss. | For information about different types of DataWorks nodes, see DataWorks nodes. Usage notes of common compute engines: |