All Products
Search
Document Center

DataWorks:Getting started

Last Updated:Dec 06, 2024

DataWorks DataAnalysis provides comprehensive data analysis and service sharing capabilities for enterprises. You can use the SQL query and workbook features provided by DataAnalysis to retrieve and analyze data of various types of data sources on a daily basis. This topic provides an example on how to use DataAnalysis. In this example, the SQL query feature is enabled for a MaxCompute compute engine to query and analyze data in a public dataset.

Permission requirements

Background information

DataAnalysis provides public datasets in various business scenarios. You can use the public datasets to experience the features provided by DataAnalysis. In this topic, an Alibaba e-commerce dataset named commerce_ali_e_commerce is used.

  • You can use the dataset to count the number of Taobao orders in different time periods.

    • The dataset contains data on random actions of approximately 1 million users in the time period from November 25, 2017 to December 3, 2017. The actions include click, purchase, add to the cart, and add to favorites.

    • In the dataset, the number of users is 987,994, the number of commodities is 4,162,024, and the total number of actions is 100,150,807.

  • For more information about DataAnalysis, see Overview.

  • Public datasets are available in the following regions: China (Shanghai), China (Beijing), China (Shenzhen), China (Hangzhou), China (Chengdu), China (Zhangjiakou), and China (Ulanqab).

Prerequisites

A MaxCompute data source is added. For more information, see Add a MaxCompute data source.

Procedure

  1. Step 1: Query data

    Use the SQL query feature to write SQL statements to quickly query and analyze data in data sources on which you have query permissions.

  2. Step 2: Analyze data

    Use the enhanced analysis feature to analyze query results in a visualized manner based on your business requirements.

  3. Step 3: Share data

    Share the query and analysis results with other users to implement online dataflows.

Go to the SQL Query page in DataAnalysis

Log on to the DataWorks console. In the top navigation bar, select the desired region. Then, choose DataAnalysis > SQL Query in the left-side navigation pane. On the page that appears, select the desired workspace from the drop-down list and click Go to SQL Query.

Step 1: Query data

This topic provides an example on how to use the SQL query feature to count the number of Taobao orders in different time periods and sort the orders based on the Alibaba e-commerce dataset commerce_ali_e_commerce, and then analyze and share query results.

  1. Go to the SQL Query page.

    You can use one of the following methods to go to the SQL Query page:

    • On the homepage of DataAnalysis, click the SQL Query card in the Shortcuts section to go to the SQL Query page.

    • In the left-side navigation pane of the DataAnalysis page, click SQL Query to go to the SQL Query page.

  2. Create an ad hoc query file.

    In the left-side pane of the SQL Query page, move the pointer over the 添加 icon to the right of My Files and select Create File to create an SQL query file as prompted. For more information about how to create an SQL query file, see Create an SQL query file.

    Note
    • In this topic, a public dataset provided by DataWorks is used. The first time you go to the SQL Query page, you can choose Go to DataAnalysis > MaxCompute on the welcome tab of the SQL Query page. An ad hoc query file is generated for you to count the number of Taobao orders in different time periods based on the commerce_ali_e_commerce dataset.

    • You can view more public datasets in the Public Tables directory on the SQL Query page in DataAnalysis.

  3. Select a data source for the SQL query file.

    On the configuration tab of the ad hoc query file, click the image.png icon in the upper-right corner to select a workspace, a compute engine type, and a data source for the SQL query task. In this example, an existing MaxCompute data source is selected.

  4. Write and run task code.

    In the code editing section of the configuration tab of the ad hoc query file, write and run code that is shown in the following figure.编辑代码

    Count the number of Taobao orders in different time periods and sort the orders based on the Alibaba e-commerce dataset commerce_ali_e_commerce.

    SET odps.namespace.schema = true
    ;
    
    SELECT  CASE    WHEN CAST(SUBSTR(behavior_time,12) AS BIGINT) >= 0
                        AND CAST(SUBSTR(behavior_time,12) AS BIGINT) <= 3 THEN '00:00 to 03:00'
                    WHEN CAST(SUBSTR(behavior_time,12) AS BIGINT) >= 4
                        AND CAST(SUBSTR(behavior_time,12) AS BIGINT) <= 7 THEN '04:00 to 07:00'
                    WHEN CAST(SUBSTR(behavior_time,12) AS BIGINT) >= 8
                        AND CAST(SUBSTR(behavior_time,12) AS BIGINT) <= 11 THEN '08:00 to 11:00'
                    WHEN CAST(SUBSTR(behavior_time,12) AS BIGINT) >= 12
                        AND CAST(SUBSTR(behavior_time,12) AS BIGINT) <= 15 THEN '12:00 to 15:00'
                    WHEN CAST(SUBSTR(behavior_time,12) AS BIGINT) >= 16
                        AND CAST(SUBSTR(behavior_time,12) AS BIGINT) <= 19 THEN '16:00 to 19:00'
                    WHEN CAST(SUBSTR(behavior_time,12) AS BIGINT) >= 20
                        AND CAST(SUBSTR(behavior_time,12) AS BIGINT) <= 23 THEN '20:00 to 23:00'
            END AS Time at which an order is placed
            ,COUNT(*) AS Number of orders
    FROM    bigdata_public_dataset.commerce.commerce_ali_e_commerce
    GROUP BY Time at which an order is placed
    ORDER BY COUNT(*) DESC
    LIMIT   100
    ;
  5. View query results.

    结果数据

Step 2: Analyze data

In the Result1 section that appears after you perform Step 1, click the 展示 icon in the left-side navigation pane and then click the image icon to go to the chart editing page. You can modify the chart information based on your business requirements.

Step 3: Share data

You can synchronize SQL query results to a workbook and share the query results with specific users to implement online dataflows.

  1. Export query results.

    In the Result1 section that appears after you perform Step 1, click the image icon on the right side and select Synchronize to Workbook and Share.

  2. The workbook editing page appears.

    On the workbook editing page, you can synchronize query results. For more information about supported operations on the workbook editing page, see Workbook.

  3. Share the query results.

    Click Share in the upper-right corner of the workbook editing page to share the query results with specific users. The users with whom you share the query results can view the query results by using a URL or access code. You can grant different permissions on the query results to specific users based on your business requirements. For example, you can grant the edit or read-only permissions to specific users.

分享数据

References

  • For more information about SQL queries, see SQL query.

  • For more information about the supported operations on a workbook, see Workbook.