All Products
Search
Document Center

DataWorks:Create and use MaxCompute resources

最終更新日:Nov 28, 2024

To use MaxCompute resources in your code or functions, you must create MaxCompute resources in a workspace or upload existing resources to the workspace before you reference the resources. You can run MaxCompute SQL commands to upload resources. You can also create resources or upload existing resources in the DataWorks console. This topic describes how to create MaxCompute resources and use the resources in nodes in the DataWorks console. This topic also describes how to register functions based on MaxCompute resources.

Background information

Resource is a concept specific to MaxCompute. To run jobs by using user-defined functions (UDFs) or MapReduce in MaxCompute, you must upload the required resources. For more information about resources, see Resource. You can upload resource packages that are developed on your on-premises machine and resource packages that are stored in Object Storage Service (OSS) to DataWorks or create resources in the DataWorks console. The resources can be read and used by UDFs and MapReduce jobs. The following table describes the resource types that are supported by DataWorks.

Resource type

Description

Creation method

Python

The Python code that you write. You can use Python code to register Python UDFs. The name of a resource of this type must end with .py.

Create a resource in the DataWorks console

JAR

A compiled JAR package that is used to run Java programs. The name of a resource of this type must end with .jar.

  • Upload a resource from your on-premises machine in the DataWorks console

  • Upload a resource that is stored in OSS in the DataWorks console

Archive

A compressed package in one of the following formats: .zip, .tgz, .tar.gz, .tar, and .jar. You can determine the compression type based on the file name extension.

  • Upload a resource from your on-premises machine in the DataWorks console

  • Upload a resource that is stored in OSS in the DataWorks console

File

A file in one of the following formats: .zip, .so, and .jar.

  • Upload a resource from your on-premises machine in the DataWorks console

  • Upload a resource that is stored in OSS in the DataWorks console

  • Create a resource in the DataWorks console (with File Source set to Online Editing)

Procedure for creating and using a resource in the DataWorks console:

  1. Step 1: Create a resource or upload an existing resource

  2. Step 2: Commit and deploy the resource

  3. Step 3: Use the resource

For more information about how to manage resources and perform other operations on the resources, see Manage resources, Perform operations on resources in a data source by using commands, and Add data source resources to DataWorks for management.

Prerequisites

  • A MaxCompute data source is added to a workspace for data development. For more information, see Add a MaxCompute data source.

  • A workflow is created. DataWorks uses workflows to store resources. You must create a workflow before you create resources. For information about how to create a workflow, see Create a workflow.

  • A node is created. Created resources must be referenced by nodes. You must create a node based on your business requirements before you can reference resources. For information about how to create a node, see Compute engine nodes.

  • Optional. The following requirements must be met if you want to upload a file from OSS:

    • OSS is activated, an OSS bucket is created, and the file that you want to upload is stored in the OSS bucket. For more information, see Create a bucket and Simple upload.

    • The Alibaba Cloud account that you want to use to upload data is granted the permissions to access the OSS bucket. For information about how to grant permissions to an account, see Overview.

Limits

  • Resource size

    • You can directly create a Python resource whose size is a maximum of 200 MB in the DataWorks console. You can create a file resource whose size is a maximum of 500 KB if you select Online Editing for the resource.

    • You can upload a file whose size is a maximum of 200 MB from your on-premises machine to DataWorks as a resource.

    • You can upload an OSS object whose size is a maximum of 500 MB to DataWorks as a resource.

  • Resource deployment

    If you use a workspace in standard mode, you must deploy resources to the production environment. This way, the resources can be used by projects in the production environment.

    Note

    The information about a data source varies based on the environment of the workspace to which the data source is added. You must be clear about the information of the data source in the environment where you want to query data. This ensures that you can query valid table and resource data in subsequent operations. For information about how to view the information about a data source in a specific environment, see Add a MaxCompute data source.

  • Resource management

    DataWorks allows you to view and manage resources that are uploaded by using the DataWorks console. If you add resources to MaxCompute by using other tools such as MaxCompute Studio, you must use the MaxCompute resource feature in DataWorks DataStudio to manually load the resources to DataWorks. Then, you can view and manage the resources in DataWorks. For more information, see Manage MaxCompute resources.

Billing

You are not charged for creating and uploading resources in Dataworks. However, you are charged for data storage and backup in MaxCompute. For more information, see Storage pricing.

Go to the entry point for creating a resource

  1. Go to the DataStudio page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and Governance > Data Development. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.

  2. Go to the entry point for creating a resource.

    On the DataStudio page, find the desired workflow, right-click the name of the workflow, select Create Resource, and then select a resource type under MaxCompute.

    In a specific workflow on the DataStudio page, you can create resources, upload existing resources from your on-premises machine, or upload existing resources from OSS. You can select a method based on the GUI for each type of resource.

    Note

    If no workflow is available, create one. For information about how to create a workflow, see Create a workflow.

Step 1: Create a resource or upload an existing resource

DataWorks allows you to upload resource packages that are developed on your on-premises machine and resource packages that are stored in OSS to DataWorks. For example, to upload UDFs that are developed on your on-premises machine to DataWorks, you must package the UDFs first. Then, you can register the UDFs in DataWorks. You can also create resources of specific types in the DataWorks console. For example, you can create a Python resource or a file resource whose size is no more than 500 KB.

Note

Take note of the following items:

  • If you create or upload a resource that is never uploaded to MaxCompute, you must select Upload to MaxCompute. If the resource has been uploaded to MaxCompute, clear Upload to MaxCompute. Otherwise, an error is reported when you upload the resource.

  • If you select Upload to MaxCompute when you create or upload a resource, the resource is stored in both DataWorks and MaxCompute after the resource is created or uploaded. If you run a command to delete the resource from MaxCompute later, the resource stored in DataWorks still exists and is normally displayed.

  • The resource name can be different from the name of the uploaded file.

Method 1: Create a resource in the DataWorks console

The following figure shows the configurations for creating a resource in the DataWorks console. You can configure information about different types of resources based on your business requirements.

Note
  • For a Python resource whose size is greater than 200 MB, use Method 3 to upload the resource from OSS to DataWorks.

  • For a file resource whose size is greater than 500 KB, use Method 2 or Method 3 to upload the file from your on-premises machine or OSS to DataWorks.

  • For information about how to create a Python resource in the DataWorks console and register functions by using the Python resource, see Use MaxCompute to query geolocations of IP addresses.

可视化新建资源

Method 2: Upload resources from your on-premises machine

The following figure shows the configurations for uploading a resource from your on-premises machine to DataWorks. You can configure information about different types of resources based on your business requirements.

Note

You can use this method to upload a resource whose size is no more than 200 MB from your on-premises machine. For a resource whose size is greater than 200 MB, use Method 3 to upload the resource from OSS to DataWorks.

上传本地资源

Method 3: Upload resources from OSS

The following figure shows the configurations for uploading a resource from OSS to DataWorks. You can configure information about different types of resources based on your business requirements.

Note
  • You can use this method to upload a resource whose size is no more than 500 MB from OSS.

  • You must follow the on-screen instructions to assign the AliyunDataWorksAccessingOSSRole role to the Alibaba Cloud account that you want to use to upload data.

image.png

Step 2: Commit and deploy the resource

After you create a resource, you can click the 提交 icon in the top toolbar on the configuration tab of the resource to commit the resource to the development environment.

Note

If nodes in the production environment need to use the resource, you must also deploy the resource to the production environment. For more information, see Deploy nodes.

Step 3: Use the resource

Scenario 1: Enable a node to use a resource

After you create a resource in the DataWorks console, the resource must be referenced by a node. After the resource is referenced, the code in the @resource_reference{"Resource name"} format is displayed. The display format of the code varies based on the type of the node that references the resource. For example, the code in the ##@resource_reference{"Resource name"} format is displayed if a PyODPS 2 node references the resource.

Note
  • If no node is available, create one. For information about how to create a node, see Compute engine nodes.

  • The running of PyODPS code depends on third-party packages. You must use a custom image to install required packages in the runtime environment, and then run the PyODPS code in the runtime environment. For more information about custom images, see Manage images.

The following figure shows the reference steps.资源加载

Scenario 2: Use a resource to register functions

Before you can use a resource to register functions, you must create a MaxCompute function by following the instructions described in Create and use a MaxCompute UDF. On the function configuration tab, you must enter the name of the desired resource, as shown in the following figure.

Note

Before you use a resource to register functions, make sure that the resource is committed. For information about how to commit a resource, see the Step 2: Commit and deploy the resource section in this topic.

使用资源注册函数

For information about the built-in functions provided by MaxCompute, see Use built-in functions.

For information about how to view the functions in a MaxCompute data source and the change history of the functions, and how to perform other operations on functions, see Manage MaxCompute functions.

Manage resources

In the Resource directory of the workflow to which a created resource belongs, you can right-click the resource name and select an option to perform the corresponding operation on the resource.

  • View Earlier Versions: You can view the saved or committed resource versions and compare the changes to the resource between different versions.

    Note

    When you compare resource versions, you must select at least two versions for comparison.

  • Delete: You can delete a resource. Only the resource used by the data source that is added to a workspace in the development environment can be deleted. If you want to delete the resource from the workspace in the production environment, you must deploy the resource deletion operation in the development environment to the production environment to make the deletion take effect in the production environment. After the operation is deployed, the resource can be deleted from the workspace in the production environment. For more information, see Deploy nodes.

Appendix 1: Perform operations on resources in a data source by using commands

The following table describes common resource operations.

Operation

Description

Performed by

Operation platform

Add resources

Adds resources to a MaxCompute project.

Users who have the Write permission on resources

You can run the commands that are described in this topic on the following platforms:

View resource information

Views the detailed information about a resource.

Users who have the Read permission on resources

View a list of resources

Views all resources in the current project.

Users who have the List permission on objects in a project

Create an alias for a resource

Creates an alias for a resource.

Users who have the Write permission on resources

Download resources

Downloads resources in a MaxCompute project to your on-premises machine.

Users who have the Write permission on resources

Delete resources

Deletes existing resources from a MaxCompute project.

Users who have the Delete permission on resources

If you do not specify a project name when you view resources in DataWorks, you can view the resources in the current project in the development environment by default.

  • View all resources in the current project. By default, you can run the following command in DataStudio to view all resources in the current MaxCompute project in the development environment:

    list resources;
  • View all resources in a specific project.

    use MaxCompute project name;
    list resources;

For more information, see Resource operations.

Appendix 2: Add data source resources to DataWorks for management

You can use the MaxCompute resource feature in DataStudio to load a MaxCompute data source resource whose size is no more than 200 MB to DataWorks for visualized management. For more information, see Manage MaxCompute resources.