All Products
Search
Document Center

DataWorks:Prepare the environment

Last Updated:Jan 21, 2026

Using user profiling as an example, this tutorial demonstrates the complete process of data synchronization, data processing, and quality monitoring using DataWorks in Singapore. Before you begin, you must prepare the necessary MaxCompute projects and DataWorks workspace, and configure the data sources, computing resources, and storage.

Background

To formulate effective business strategies, enterprises need to obtain basic user profile data—such as geographical and social attributes—derived from website user behavior. This data supports scheduled, targeted profile analysis, enabling refined website traffic operations.

Prerequisites

Before you begin, read Experiment introduction to understand the overall workflow of the user profile analysis case.

Precautions

  • This tutorial provides sample user information and website access data for immediate use.

  • The data provided in this case is intended solely for hands-on practice with DataWorks. All data is artificially generated mock data.

  • This tutorial uses DataStudio (legacy version) for data processing.

Prepare the MaxCompute environment

Step 1: Activate MaxCompute

Ensure that MaxCompute is activated. Activate MaxCompute in Singapore using the following parameters:

  • Region: Singapore.

  • Specification Type: Standard.

Step 2: Create a MaxCompute project

In a DataWorks standard mode workspace, you must associate two MaxCompute projects (one for development and one for production). These serve as the computing resources for the DataWorks development and production environments, respectively.

  1. Go to the MaxCompute Console. In the left navigation pane, choose Manage Configurations > Projects.

  2. Click Create Project to create two MaxCompute projects. The following are key parameters required for this example. Retain the default values for any parameters not mentioned. For details, see Create a MaxCompute project.

    Parameter

    Description

    Project Name

    Custom string; must be globally unique.

    Example:

    • Production environment: workshop2024_01.

    • Development environment: workshop2024_01_dev.

    Billing Method

    Select Pay-as-you-go.

    Default Quota

    Select Default post-paid Quota.

    Data Type Edition

    Select MaxCompute V2.0 Data Type Edition (Recommended).

    Storage Encryption

    Select No.

For more instructions on creating a MaxCompute project, see Create a MaxCompute project.

Prepare the DataWorks environment

Before using DataWorks for development, ensure that the DataWorks service is activated. For details, see Purchase.

Step 1: Create a workspace

  1. Log on to the DataWorks Console. In the top navigation bar, switch the region to Singapore. Click Workspace in the left navigation pane to go to the workspace list page.

  2. Click Create Workspace to create a standard mode workspace (Isolate Development and Production Environments). Do not select Use Data Studio (New Version).

Note
  • Starting February 18, 2025, when an Alibaba Cloud account activates DataWorks and creates a workspace in the Singapore for the first time, the Data Studio (new version) is enabled by default, and the Use Data Studio (New Version) parameter will not be displayed. If the new version of Data Studio is enabled by default, see Experience Data Studio (new version) for the specific tutorial.

For more instructions on creating a workspace, see Create a workspace.

Step 2: Create a serverless resource group

This tutorial involves synchronizing data from OSS and MySQL to MaxCompute. This synchronization task requires a DataWorks serverless resource group. You must purchase a serverless resource group and complete the preparations described below.

  1. Purchase a serverless resource group.

    This tutorial requires a DataWorks serverless resource group for data synchronization and scheduling. Purchase a serverless resource group and complete the following preparations.

    1. Log on to the DataWorks - Resource Groups page. In the top navigation bar, switch the region to Singapore. Click Resource Group in the left navigation pane to go to the Resource Groups page.

    2. Click Create Resource Group. On the buy page, set Region and Zone to Singapore and set Resource Group Name. Configure other parameters as prompted. Complete the payment. For billing details of serverless resource groups, see Billing of serverless resource groups.

      Note

      If there are no available VPCs and vSwitches in the current region, click the console link in the parameter description to create them. For more information about VPCs and vSwitches, see What is VPC?.

  2. Associate the resource group with the DataWorks workspace.

    You must associate the newly purchased serverless resource group with a workspace before use.

    Log on to the DataWorks - Resource Groups page. In the top navigation bar, switch the region to Singapore. Find the purchased serverless resource group, click Associate Workspace in the Actions column, and then click Associate next to the target DataWorks workspace.

  3. Configure public network access for the resource group.

    The test data is retrieved over the Internet. By default, resource groups do not have public network access. To enable access, you must configure an Internet NAT gateway for the VPC bound to the resource group and add an Elastic IP Address (EIP). This connects the resource group to the public network, allowing it to retrieve data.

    1. Log on to the VPC - Internet NAT Gateway Console. In the top menu bar, switch to the Singapore region.

    2. Click Create Internet NAT Gateway and configure the relevant parameters. The following are key parameters required for this example. Retain the default values for any parameters not mentioned.

      Parameter

      Value

      Region

      Singapore

      Network and Zone

      Select the VPC and vSwitch that are bound to the resource group.

      Go to the DataWorks Console, switch the region, click Resource Group in the left navigation pane, find the created resource group, click Network Settings in the Actions column, and view the VPC and vSwitch in the Data Scheduling & Data Integration section. For more information about VPCs and vSwitches, see What is VPC?.

      Network Type

      Select Internet NAT gateway.

      EIP

      Select Purchase EIP.

      Service-linked Role

      If you are creating a NAT gateway for the first time, you must create a service-linked role. Click Create Service-linked Role.

    3. Click Buy Now, select the service agreement, and click Activate Now to complete the purchase.

For more instructions on adding and using serverless resource groups, see Use serverless resource groups.

Step 3: Associate the MaxCompute project

Associate the created MaxCompute project with the DataWorks workspace as a computing resource. This allows you to process MaxCompute data within DataStudio.

  1. Go to the DataWorks - Workspace List page. In the top navigation bar, switch the region to Singapore. Find the created workspace and click the workspace name to enter the Workspace Details page.

  2. In the left navigation pane, click Computing Resources.

  3. Click Create Computing Resource, select the computing resource, and configure the relevant parameters.

    This tutorial uses MaxCompute as the computing and storage resource. Select MaxCompute as the resource type and configure the parameters. The following are key parameters required for this example. Retain the default values for any parameters not mentioned.

    Parameter

    Description

    Data Source Name

    Custom name. Identifies the computing resource. When running a task, the computing resource instance name is used to select the computing resource for the task.

    Alibaba Cloud Account

    Select Current Alibaba Cloud Account.

    Region

    Select Singapore, which must match the current DataWorks workspace region.

    MaxCompute Project Name

    Select the MaxCompute project to associate. In this tutorial, associate the production and development projects created in Step 2 with their respective environments.

    Default Access Identity

    Defines the identity used to access the MaxCompute project within the current workspace.

    Connection Configuration

    Establish a connection between the MaxCompute computing resource and the serverless resource group. This section displays the serverless resource groups bound to the current workspace. You must test connectivity for both the development and production environments.

  4. Click Create and Associate Computing Resource with Datastudio.

    Refresh the DataStudio computing resource page as prompted to see the created MaxCompute computing resource.

    Note

    If the MaxCompute computing resource status is Not Associated, click Associate.

Next steps

The environment preparation is now complete. Proceed to the next tutorial. In the next tutorial, you will learn how to synchronize user basic information and website access logs to MaxCompute. For details, see Synchronize data.