All Products
Search
Document Center

:Public dataset reference

Last Updated:Nov 04, 2024

If you have activated MaxCompute, you can use SQL analytics of MaxCompute to obtain and query tables from public datasets. This helps you quickly get started with MaxCompute. This topic describes the public datasets of MaxCompute and how to use SQL analytics of MaxCompute to query and analyze data in the public datasets.

The open data of MaxCompute mainly refers to the data in the datasets of the click rate predictions for advertisements displayed on Taobao.com. The datasets are provided by Alibaba Group. For more information about the fields in the datasets, see Tianchi dataset. The data is stored in the MAXCOMPUTE_PUBLIC_DATA project of MaxCompute.

Disclaimer

Data in the public datasets of MaxCompute is only intended for product testing. The data is not periodically updated and its accuracy is not ensured. Therefore, do not use the data in your production process.

Precautions

You can authorize all MaxCompute users to access the public datasets by using a special authorization mechanism of MaxCompute. When you use the public datasets, take note of the following items:

  • All data is stored in the public MaxCompute project MAXCOMPUTE_PUBLIC_DATA. No MaxCompute users belong to this project. Therefore, when you compile an SQL script to access the public datasets, you must specify the project name before the table name. Sample statements:

    SELECT * FROM MAXCOMPUTE_PUBLIC_DATA.raw_sample limit 10;
    Note

    You can view the data in the public datasets free of charge. However, you are charged when you execute query statements. For more information about billing rules, see Computing pricing.

  • You cannot find the tables in the public datasets on the Data Map page of DataWorks because cross-project access is required.

Public datasets

The following tables describe the details of each public dataset in the MAXCOMPUTE_PUBLIC_DATA project.

  • Stock

    Project name

    MAXCOMPUTE_PUBLIC_DATA

    Table name

    Relevant tables:

    • ods_enterprise_share_basic (basic stock information)

    • ods_enterprise_share_quarter_cashflow (quarterly cash flow report)

    • ods_enterprise_share_quarter_growth (quarterly growth data)

    • ods_enterprise_share_quarter_operation (quarterly operational data)

    • ods_enterprise_share_quarter_profit (quarterly profit data)

    • ods_enterprise_share_quarter_report (quarterly report)

    • ods_enterprise_share_trade_h (stock trading data)

    Update cycle

    Data is provided in date-specific partitions and is no longer updated incrementally.

    Query table schemas

    DESC MAXCOMPUTE_PUBLIC_DATA.table_name;

    Query examples

    SELECT * FROM MAXCOMPUTE_PUBLIC_DATA.ods_enterprise_share_basic WHERE ds ='20170114';

  • Second-hand property

    Project name

    MAXCOMPUTE_PUBLIC_DATA

    Table name

    dwd_product_house_basic_info_out (second-hand property data)

    Update cycle

    Data is provided in date-specific partitions and is no longer updated incrementally.

    Query table schemas

    DESC MAXCOMPUTE_PUBLIC_DATA.table_name;

    Query examples

    SELECT * FROM MAXCOMPUTE_PUBLIC_DATA.dwd_product_house_basic_info_out WHERE ds='20170113';

  • Film and television box office data

    Project name

    MAXCOMPUTE_PUBLIC_DATA

    Table name

    Relevant tables:

    • dwd_product_movie_basic_info (basic film information)

    • ods_product_movie_box (box office information)

    Update cycle

    Data is provided in date-specific partitions and is no longer updated incrementally.

    Query table schemas

    DESC MAXCOMPUTE_PUBLIC_DATA.table_name;

    Query examples

    SELECT * FROM MAXCOMPUTE_PUBLIC_DATA.dwd_product_movie_basic_info WHERE ds ='20170112' LIMIT 10;

  • Administrative and urban-rural area codes

    Project name

    MAXCOMPUTE_PUBLIC_DATA

    Table name

    dwd_product_areacode_basic_info_2020 (basic information table of administrative and urban-rural area codes in 2020)

    Update cycle

    The dataset is static and is no longer updated incrementally.

    Query table schemas

    DESC MAXCOMPUTE_PUBLIC_DATA.table_name;

    Query examples

    SELECT * FROM MAXCOMPUTE_PUBLIC_DATA.dwd_product_areacode_basic_info_2020 LIMIT 10;

  • Mobile phone number attribution

    Project name

    MAXCOMPUTE_PUBLIC_DATA

    Table name

    dwd_product_phoneno_basic_info_2020 (mobile phone number attribution basic information table in 2020)

    Update cycle

    The dataset is static and is no longer updated incrementally.

    Query table schemas

    DESC MAXCOMPUTE_PUBLIC_DATA.table_name;

    Query examples

    SELECT * FROM MAXCOMPUTE_PUBLIC_DATA.dwd_product_phoneno_basic_info_2020 LIMIT 10;

  • Raw samples

    This dataset includes user click log of advertisements from Taobao.com, based on one million random users for eight days.

    Project name

    MAXCOMPUTE_PUBLIC_DATA

    Table name

    raw_sample

    Update cycle

    The dataset is static and is no longer updated incrementally.

    Query table schemas

    DESC MAXCOMPUTE_PUBLIC_DATA.table_name;

    Query examples

    SELECT * FROM MAXCOMPUTE_PUBLIC_DATA.raw_sample LIMIT 10;

  • Basic advertisement information

    This dataset includes basic information about advertisements from the raw_sample table.

    Project name

    MAXCOMPUTE_PUBLIC_DATA

    Table name

    ad_feature

    Update cycle

    The dataset is static and is no longer updated incrementally.

    Query table schemas

    DESC MAXCOMPUTE_PUBLIC_DATA.table_name;

    Query examples

    SELECT * FROM MAXCOMPUTE_PUBLIC_DATA.ad_feature LIMIT 10;

  • Basic user information

    This dataset comprises the basic information of all users from the raw_sample table.

    Project name

    MAXCOMPUTE_PUBLIC_DATA

    Table name

    user_profile

    Update cycle

    The dataset is static and is no longer updated incrementally.

    Query table schemas

    DESC MAXCOMPUTE_PUBLIC_DATA.table_name;

    Query examples

    SELECT * FROM MAXCOMPUTE_PUBLIC_DATA.user_profile LIMIT 10;

  • User behavior logs

    This dataset comprises the shopping activities of all users from the raw_sample table over a 22-day span.

    Project name

    MAXCOMPUTE_PUBLIC_DATA

    Table name

    behavior_log

    Update cycle

    The dataset is static and is no longer updated incrementally.

    Query table schemas

    DESC MAXCOMPUTE_PUBLIC_DATA.table_name;

    Query examples

    SELECT * FROM MAXCOMPUTE_PUBLIC_DATA.behavior_log LIMIT 10;