DataWorks DataAnalysis provides comprehensive data analysis and service sharing capabilities for enterprises. You can use the SQL query and workbook features provided by DataAnalysis to retrieve and analyze data of various types of data sources on a daily basis. This topic provides an example on how to use DataAnalysis. In this example, the SQL query feature is enabled for a MaxCompute compute engine to query and analyze data in a public dataset.
Permission requirements
Make sure that you have the required permissions on DataAnalysis. For more information about the permissions of different roles on DataAnalysis, see the "DataAnalysis" section in Permissions of built-in workspace-level roles.
For more information about how to assign roles to users, see Add a RAM user to a workspace as a member and assign roles to the member.
Background information
DataAnalysis provides public datasets in various business scenarios. You can use the public datasets to experience the features provided by DataAnalysis. In this topic, an Alibaba e-commerce dataset named commerce_ali_e_commerce
is used.
You can use the dataset to count the number of Taobao orders in different time periods.
The dataset contains data on random actions of approximately 1 million users in the time period from
November 25, 2017
toDecember 3, 2017
. The actions include click, purchase, add to the cart, and add to favorites.In the dataset, the number of users is
987,994
, the number of commodities is4,162,024
, and the total number of actions is100,150,807
.
For more information about DataAnalysis, see Overview.
Prerequisites
A MaxCompute data source is added. For more information, see Add a MaxCompute data source.
Procedure
Use the SQL query feature to write SQL statements to quickly query and analyze data in data sources on which you have query permissions.
Use the enhanced analysis feature to analyze query results in a visualized manner based on your business requirements.
Share the query and analysis results with other users to implement online dataflows.
Go to the SQL Query page in DataAnalysis
Log on to the DataWorks console. In the top navigation bar, select the desired region. In the left-side navigation pane, choose . On the page that appears, select the desired workspace from the drop-down list and click Go to SQL Query.
Step 1: Query data
This topic provides an example on how to use the SQL query feature to count the number of Taobao orders in different time periods and sort the orders based on the Alibaba e-commerce dataset commerce_ali_e_commerce
, and then analyze and share query results.
Go to the SQL Query page.
You can use one of the following methods to go to the SQL Query page:
On the homepage of DataAnalysis, click the SQL Query card in the Shortcuts section to go to the SQL Query page.
In the left-side navigation pane of the DataAnalysis page, click SQL Query to go to the SQL Query page.
Create an ad hoc query file.
In the left-side pane of the SQL Query page, move the pointer over the icon to the right of My Files and select Create File to create an SQL query file as prompted. For more information about how to create an SQL query file, see Create an SQL query file.
NoteIn this topic, a public dataset provided by DataWorks is used. The first time you go to the SQL Query page, you can choose
on the welcome tab of the SQL Query page. An ad hoc query file is generated for you to count the number of Taobao orders in different time periods based on thecommerce_ali_e_commerce
dataset.You can view more public datasets in the Public Tables directory on the SQL Query page in DataAnalysis.
Select a data source for the SQL query file.
On the configuration tab of the ad hoc query file, click the icon in the upper-right corner to select a workspace, a compute engine type, and a data source for the SQL query task. In this example, an existing MaxCompute data source is selected.
Write and run task code.
In the code editing section of the configuration tab of the ad hoc query file, write and run code that is shown in the following figure.
Count the number of Taobao orders in different time periods and sort the orders based on the Alibaba e-commerce dataset commerce_ali_e_commerce.
SET odps.namespace.schema = true ; SELECT CASE WHEN CAST(SUBSTR(behavior_time,12) AS BIGINT) >= 0 AND CAST(SUBSTR(behavior_time,12) AS BIGINT) <= 3 THEN '00:00 to 03:00' WHEN CAST(SUBSTR(behavior_time,12) AS BIGINT) >= 4 AND CAST(SUBSTR(behavior_time,12) AS BIGINT) <= 7 THEN '04:00 to 07:00' WHEN CAST(SUBSTR(behavior_time,12) AS BIGINT) >= 8 AND CAST(SUBSTR(behavior_time,12) AS BIGINT) <= 11 THEN '08:00 to 11:00' WHEN CAST(SUBSTR(behavior_time,12) AS BIGINT) >= 12 AND CAST(SUBSTR(behavior_time,12) AS BIGINT) <= 15 THEN '12:00 to 15:00' WHEN CAST(SUBSTR(behavior_time,12) AS BIGINT) >= 16 AND CAST(SUBSTR(behavior_time,12) AS BIGINT) <= 19 THEN '16:00 to 19:00' WHEN CAST(SUBSTR(behavior_time,12) AS BIGINT) >= 20 AND CAST(SUBSTR(behavior_time,12) AS BIGINT) <= 23 THEN '20:00 to 23:00' END AS Time at which an order is placed ,COUNT(*) AS Number of orders FROM bigdata_public_dataset.commerce.commerce_ali_e_commerce GROUP BY Time at which an order is placed ORDER BY COUNT(*) DESC LIMIT 100 ;
View query results.
Step 2: Analyze data
In the Result1 section that appears after you perform Step 1, click the icon in the left-side navigation pane and then click the icon to go to the chart editing page. You can modify the chart information based on your business requirements.
Step 3: Share data
You can synchronize SQL query results to a workbook and share the query results with specific users to implement online dataflows.
Export query results.
In the Result1 section that appears after you perform Step 1, click the icon on the right side and select Synchronize to Workbook and Share.
The workbook editing page appears.
On the workbook editing page, you can synchronize query results. For more information about supported operations on the workbook editing page, see Workbook.
Share the query results.
Click Share in the upper-right corner of the workbook editing page to share the query results with specific users. The users with whom you share the query results can view the query results by using a URL or access code. You can grant different permissions on the query results to specific users based on your business requirements. For example, you can grant the edit or read-only permissions to specific users.