All Products
Search
Document Center

DataWorks:Create tables and upload data

Last Updated:Nov 14, 2024

This topic describes how to create tables in the DataWorks console and upload data from your on-premises machine to the tables. The bank_data and result_table tables are used in the examples.

Prerequisites

A MaxCompute data source is added to a workspace and the data source is associated with the DataStudio service in the workspace. In this case, you can see the MaxCompute folder on the DataStudio page.

  • A MaxCompute data source is added to a workspace. For more information, see Add a MaxCompute data source.

  • The data source is associated with the DataStudio service in the workspace. You can click Data Source in the left-side navigation pane of the DataStudio page in a workspace and associate a data source with the DataStudio service in the workspace.

Background information

The bank_data table is used to store business data and the result_table table is used to store data analytics results.

Create the bank_data table

  1. Go to the DataStudio page.

    Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose Data Development and Governance > Data Development. On the page that appears, select the desired workspace from the drop-down list and click Go to Data Development.

  2. In the Scheduled Workflow pane of the DataStudio page, move the pointer over the 新建 icon and choose Create Table > MaxCompute > Table. 数据开发创建MaxCompute表

    Alternatively, you can click the desired workflow in the Business Flow section, right-click MaxCompute, and then select Create Table.

  3. In the Create Table dialog box, specify a path, set the Name to bank_data, and then click Create.

    Important

  4. On the table configuration tab, click DDL.

  5. In the DDL dialog box, enter the following statement and click Generate Table Schema:

    CREATE TABLE IF NOT EXISTS bank_data
    (
     age             BIGINT COMMENT 'Age',
     job             STRING COMMENT 'Job type',
     marital         STRING COMMENT 'Marital status',
     education       STRING COMMENT 'Education level',
     default         STRING COMMENT 'Credit card',
     housing         STRING COMMENT 'Mortgage',
     loan            STRING COMMENT 'Loan',
     contact         STRING COMMENT 'Contact information',
     month           STRING COMMENT 'Month',
     day_of_week     STRING COMMENT 'Day of the week',
     duration        STRING COMMENT 'Duration',
     campaign        BIGINT COMMENT 'Number of contacts during the campaign',
     pdays           DOUBLE COMMENT 'Interval from the last contact',
     previous        DOUBLE COMMENT 'Number of contacts with the customer',
     poutcome        STRING COMMENT 'Result of the previous marketing campaign',
     emp_var_rate    DOUBLE COMMENT 'Employment change rate',
     cons_price_idx  DOUBLE COMMENT 'Consumer price index',
     cons_conf_idx   DOUBLE COMMENT 'Consumer confidence index',
     euribor3m       DOUBLE COMMENT 'Euro deposit rate',
     nr_employed     DOUBLE COMMENT 'Number of employees',
     y               BIGINT COMMENT 'Time deposit available or not'
    );

    For more information about the SQL syntax for creating tables, see Create tables.

  6. In the Confirm message, click Confirmation.

  7. Configure the Display Name parameter in the General section and click Commit to Development Environment and Commit to Production Environment respectively.

    Note

    A workspace in standard mode is used in the example. If you are using a workspace in basic mode, you only need to click Commit to Production Environment.

  8. In the left-side navigation pane, click the Workspace Tables icon.

  9. In the Workspace Tables pane, double-click the name of the created table to view the table information.

Create the result_table table

  1. In the Scheduled Workflow pane of the DataStudio page, move the pointer over the 新建 icon and choose Create Table > MaxCompute.

    Alternatively, you can click the desired workflow in the Business Flow section, right-click MaxCompute, and then select Create Table.

  2. In the Create Table dialog box, specify a path, set the Name to result_table, and then click Create.

  3. In the DDL dialog box, enter the following statement and click Generate Table Schema:

    CREATE TABLE IF NOT EXISTS result_table
    (  
     education   STRING COMMENT 'Education level',
     num         BIGINT COMMENT 'Number of persons'
    );
  4. In the Confirm message, click Confirmation.

  5. Configure the Display Name parameter in the General section and click Commit to Development Environment and Commit to Production Environment respectively.

  6. In the left-side navigation pane, click the Workspace Tables icon.

  7. In the Workspace Tables pane, double-click the name of the created table to view the table information.

Upload a file from your on-premises machine to the bank_data table

DataWorks allows you to perform the following operations:

  • Upload a text file from your on-premises machine to a table in a workspace.

  • Use Data Integration to import business data from various data sources to a workspace.

Note

You must take note of the following items when you upload a file from your on-premises machine:

  • File format: The file must be in the .txt, .csv, or .log format.

  • File size: The size of the file cannot exceed 30 MB.

    If you need to upload a file that is greater than 30 MB in size, use one of the following methods:

    • Upload the file to Object Storage Service (OSS) and import the object data to a MaxCompute table by using an external table of MaxCompute. For more information about how to upload a file to OSS, see Upload objects. For more information about how to use external tables of MaxCompute, see External tables.

    • Upload the file to OSS and synchronize the object data to a MaxCompute table by using Data Integration. For more information about how to upload a file to OSS, see Upload objects. For more information about how to synchronize data from OSS to MaxCompute, see Configure a batch synchronization task by using the codeless UI.

    • Use the data upload feature provided by DataAnalysis.

  • Table type: You can import data from a file to a partitioned table or a non-partitioned table. If you import data from a file to a partitioned table, values in the partition columns cannot contain special characters such as ampersands (&) and asterisks (*).

In this example, the file banking.txt is uploaded from an on-premises machine to the bank_data table. To upload the file, perform the following steps:

  1. Click the 导入 icon on the DataStudio page.使用数据开发导入数据

  2. In the Import Local Data to Table in Development Environment dialog box, enter at least three letters in the search box to search for tables, select the bank_data from the search results, and then click Next.

    Note

    If you cannot find the table that you created, you can manually synchronize the table to Data Map in DataWorks. Then, you can search for the table by keyword in the dialog box again. For more information about how to manually synchronize a table, see Manually refresh table metadata.

  3. In the dialog box that appears, click Browse next to Select File to select the desired file. The value of the Select Data Import Method parameter is Upload Local File and cannot be changed. Then, configure the other parameters.

    bank

    Parameter

    Description

    Select Data Import Method

    The method that can be used to upload data. Default value: Upload Local File. The default value cannot be changed.

    File Format

    Valid values: CSV and Custom Text File.

    Select File

    The file that you want to upload. To upload a file, click Browse and select the file to upload.

    Select Delimiter

    The delimiter used in the file. Valid values: Comma (,), Tab, Semicolon (;), Space, |, #, and &. In this example, Comma (,) is selected.

    Original Character Set

    The character set of the file. Valid values: GBK, UTF-8, CP936, and ISO-8859. In this example, GBK is selected.

    Import First Row

    The row from which data is to be imported. In this example, 1 is selected.

    First Row as Field Names

    Specifies whether to use the first row as the header row. In this example, Yes is not selected.

    Preview

    The preview result of the data that you want to upload.

  4. Click Next.

  5. Select a matching mode for the fields in the source file and destination table. In this example, By Location is selected.

  6. Click Import Data.

What to do next

You have learned how to create tables and upload data from your on-premises machine to the created tables. You can proceed with the next tutorial. In the next tutorial, you will learn how to create, configure, and commit a workflow, and perform data computing and analytics in the workspace. For more information, see Create a workflow.