If you have activated MaxCompute, you can use SQL analytics of MaxCompute to obtain and query tables from public datasets. This helps you quickly get started with MaxCompute. This topic describes the public datasets of MaxCompute and how to use SQL analytics of MaxCompute to query and analyze data in the public datasets.
The open data of MaxCompute mainly refers to the data in the datasets of the click rate predictions for advertisements displayed on Taobao.com. The datasets are provided by Alibaba Group. For more information about the fields in the datasets, see Tianchi dataset. The data is stored in the MAXCOMPUTE_PUBLIC_DATA project of MaxCompute.
Disclaimer
Data in the public datasets of MaxCompute is only intended for product testing. The data is not periodically updated and its accuracy is not ensured. Therefore, do not use the data in your production process.
Precautions
You can authorize all MaxCompute users to access the public datasets by using a special authorization mechanism of MaxCompute. When you use the public datasets, take note of the following items:
All data is stored in the public MaxCompute project MAXCOMPUTE_PUBLIC_DATA. No MaxCompute users belong to this project. Therefore, when you compile an SQL script to access the public datasets, you must specify the project name before the table name. Sample statements:
SELECT * FROM MAXCOMPUTE_PUBLIC_DATA.raw_sample limit 10;
NoteYou can view the data in the public datasets free of charge. However, you are charged when you execute query statements. For more information about billing rules, see Computing pricing.
You cannot find the tables in the public datasets on the Data Map page of DataWorks because cross-project access is required.
Public datasets
The following tables describe the details of each public dataset in the MAXCOMPUTE_PUBLIC_DATA project.
Stock
Project name
MAXCOMPUTE_PUBLIC_DATA
Table name
Relevant tables:
ods_enterprise_share_basic (basic stock information)
ods_enterprise_share_quarter_cashflow (quarterly cash flow report)
ods_enterprise_share_quarter_growth (quarterly growth data)
ods_enterprise_share_quarter_operation (quarterly operational data)
ods_enterprise_share_quarter_profit (quarterly profit data)
ods_enterprise_share_quarter_report (quarterly report)
ods_enterprise_share_trade_h (stock trading data)
Update cycle
Data is provided in date-specific partitions and is no longer updated incrementally.
Query table schemas
DESC MAXCOMPUTE_PUBLIC_DATA.table_name;
Query examples
SELECT * FROM MAXCOMPUTE_PUBLIC_DATA.ods_enterprise_share_basic WHERE ds ='20170114';
Second-hand property
Project name
MAXCOMPUTE_PUBLIC_DATA
Table name
dwd_product_house_basic_info_out (second-hand property data)
Update cycle
Data is provided in date-specific partitions and is no longer updated incrementally.
Query table schemas
DESC MAXCOMPUTE_PUBLIC_DATA.table_name;
Query examples
SELECT * FROM MAXCOMPUTE_PUBLIC_DATA.dwd_product_house_basic_info_out WHERE ds='20170113';
Film and television box office data
Project name
MAXCOMPUTE_PUBLIC_DATA
Table name
Relevant tables:
dwd_product_movie_basic_info (basic film information)
ods_product_movie_box (box office information)
Update cycle
Data is provided in date-specific partitions and is no longer updated incrementally.
Query table schemas
DESC MAXCOMPUTE_PUBLIC_DATA.table_name;
Query examples
SELECT * FROM MAXCOMPUTE_PUBLIC_DATA.dwd_product_movie_basic_info WHERE ds ='20170112' LIMIT 10;
Administrative and urban-rural area codes
Project name
MAXCOMPUTE_PUBLIC_DATA
Table name
dwd_product_areacode_basic_info_2020 (basic information table of administrative and urban-rural area codes in 2020)
Update cycle
The dataset is static and is no longer updated incrementally.
Query table schemas
DESC MAXCOMPUTE_PUBLIC_DATA.table_name;
Query examples
SELECT * FROM MAXCOMPUTE_PUBLIC_DATA.dwd_product_areacode_basic_info_2020 LIMIT 10;
Mobile phone number attribution
Project name
MAXCOMPUTE_PUBLIC_DATA
Table name
dwd_product_phoneno_basic_info_2020 (mobile phone number attribution basic information table in 2020)
Update cycle
The dataset is static and is no longer updated incrementally.
Query table schemas
DESC MAXCOMPUTE_PUBLIC_DATA.table_name;
Query examples
SELECT * FROM MAXCOMPUTE_PUBLIC_DATA.dwd_product_phoneno_basic_info_2020 LIMIT 10;
Raw samples
This dataset includes user click log of advertisements from Taobao.com, based on one million random users for eight days.
Project name
MAXCOMPUTE_PUBLIC_DATA
Table name
raw_sample
Update cycle
The dataset is static and is no longer updated incrementally.
Query table schemas
DESC MAXCOMPUTE_PUBLIC_DATA.table_name;
Query examples
SELECT * FROM MAXCOMPUTE_PUBLIC_DATA.raw_sample LIMIT 10;
Basic advertisement information
This dataset includes basic information about advertisements from the raw_sample table.
Project name
MAXCOMPUTE_PUBLIC_DATA
Table name
ad_feature
Update cycle
The dataset is static and is no longer updated incrementally.
Query table schemas
DESC MAXCOMPUTE_PUBLIC_DATA.table_name;
Query examples
SELECT * FROM MAXCOMPUTE_PUBLIC_DATA.ad_feature LIMIT 10;
Basic user information
This dataset comprises the basic information of all users from the raw_sample table.
Project name
MAXCOMPUTE_PUBLIC_DATA
Table name
user_profile
Update cycle
The dataset is static and is no longer updated incrementally.
Query table schemas
DESC MAXCOMPUTE_PUBLIC_DATA.table_name;
Query examples
SELECT * FROM MAXCOMPUTE_PUBLIC_DATA.user_profile LIMIT 10;
User behavior logs
This dataset comprises the shopping activities of all users from the raw_sample table over a 22-day span.
Project name
MAXCOMPUTE_PUBLIC_DATA
Table name
behavior_log
Update cycle
The dataset is static and is no longer updated incrementally.
Query table schemas
DESC MAXCOMPUTE_PUBLIC_DATA.table_name;
Query examples
SELECT * FROM MAXCOMPUTE_PUBLIC_DATA.behavior_log LIMIT 10;