All Products
Search
Document Center

MaxCompute:Project-level Information Schema (To be phased out)

最終更新日:Sep 20, 2024

Information Schema is the metadata service of MaxCompute. This topic describes the basic concepts, features, and limits of Information Schema.

MaxCompute Information Schema provides information such as project metadata and historical usage data. Fields and views that are specific to MaxCompute are added to ANSI SQL-92 Information Schema. MaxCompute provides a public project named Information Schema. You can query the metadata and historical usage data of your project by accessing the read-only views provided by this public project.

Limits

  • Information Schema provides metadata views of the current project. You are not authorized to access cross-project metadata. If you want to query and analyze the metadata of multiple projects, you must obtain the metadata of each project and integrate the metadata.

  • Quasi-real-time views are provided for metadata system tables. For applications that require high metadata timeliness, we recommend that you use an SDK or CLI to obtain the metadata of a specified object.

  • Metadata and historical data of jobs are stored in the Information Schema project. To create a snapshot of the historical data or obtain historical job data of more than 14 days, you can back up Information Schema data to a specified project on a regular basis.

Obtain the Information Schema service

From March 1, 2024, MaxCompute no longer automatically provides the project-level Information Schema service for new projects. By default, no package of project-level Information Schema is provided for the projects that are created on March 1, 2024 and onwards. If you want to query metadata, you can use tenant-level Information Schema to obtain more comprehensive information. For more information about how to use tenant-level Information Schema, see Tenant-level Information Schema.

Before you use Information Schema in an existing project, you must obtain the permissions to access the project metadata. To obtain the permissions, install the Information Schema permission package as the project owner or a RAM user that is assigned the Super_Administrator role. For more information, see Assign a role to a user. You can use one of the following methods to install the permission package:

After the package is installed, you can use Information Schema to query the metadata of the current project. Data is stored in the Information Schema project. You do not need to pay for metadata storage.

You can run the following command to query the views provided by the Information Schema project:

odps@myproject1> describe package Information_Schema.systables;

The following figure shows the query result.information_schema 截图返回结果

Query a metadata view

If you want to query a metadata view, you must prefix the project name Information Schema to the view name. The format is Information Schema.view_name.

If the project that you access is myproject1, you can run the following command to query the metadata of all tables in myproject1:

odps@myproject1>select * from Information_Schema.tables;

The Information Schema project also contains the job history view. This view allows you to query the job history of the current project. You can run the following command to query historical jobs by date:

odps@myproject1>select * from Information_Schema.tasks_history where ds='yyyymmdd' limit 100;

Access authorization

The views provided by Information Schema contain all the user data at the project level. By default, the owner of a project can view the user data of this project. Other users or roles in the project must be granted the required permissions to view the data. For more information, see Cross-project resource access based on packages.

Syntax of the statements that are used to grant permissions to users or roles:

grant <actions> on package Information_Schema.systables to user <user_name>;
grant <actions> on package Information_Schema.systables to role <role_name>;
  • actions: the permissions that you want to grant. Set the value to Read.

  • user_name: an Alibaba Cloud account or RAM user that is added to the project.

    You can run the list users; command on the MaxCompute client to obtain user accounts.

  • role_name: a role that is added to the project.

    You can run the list roles; command on the MaxCompute client to query the name of the role.

Example:

grant read on package Information_Schema.systables to user RAM$Bob@aliyun.com:user01;

Metadata views

The metadata views of the Information Schema service allow you to browse and retrieve metadata.

The usage information views of the Information Schema service allow you to optimize jobs and plan resources. For example, you can analyze the metrics of a job, such as the resource consumption, running duration, and amount of processed data.

Different views have different validity periods or default retention periods. Data that exceeds the retention period is inaccessible. You can manually export data from Information Schema to a MaxCompute table to back up the data at a specified interval. This backup applies to historical data that requires a longer storage period.

The fees for Information Schema vary based on the following scenarios:

  • For projects that use pay-as-you-go computing resources, you are charged based on the SQL statements that you execute to query a view of Information Schema. Range-clustered tables are used in the views of Information Schema to reduce the amount of input query data and improve query performance. If you query the TASKS_HISTORY and TUNNELS_HISTORY views, query the data of the previous day after 6:00:00 every day to avoid querying the data of the current day. This minimizes the amount of input query data and reduces query costs.

  • For projects that use subscription computing resources, the compute units (CUs) that you purchased are consumed when you query views of Information Schema.

  • You do not need to pay storage fees for the views of Information Schema.

Note

When you export data, we recommend that you explicitly specify the field names of the view. If you do not explicitly specify the field names of the view, you may fail to back up data by running insert into select * from information_schema.*** after some fields are inserted into a MaxCompute table.

The following table describes the metadata views.

Type

View

Timeliness and retention period

Delay

Metadata information

TABLES

Quasi-real-time view

Online data is displayed in metadata views with a delay of about 3 hours.

PARTITIONS

Quasi-real-time view

COLUMNS

Quasi-real-time view

UDFS

Quasi-real-time view

RESOURCES

Quasi-real-time view

UDF_RESOURCES

Quasi-real-time view

USERS

Quasi-real-time view

ROLES

Quasi-real-time view

USER_ROLES

Quasi-real-time view

PACKAGE_OBJECTS

Quasi-real-time view

INSTALLED_PACKAGES

Quasi-real-time view

SCHEMA_PRIVILEGES

Quasi-real-time view

TABLE_PRIVILEGES

Quasi-real-time view

COLUMN_PRIVILEGES

Quasi-real-time view

UDF_PRIVILEGES

Quasi-real-time view

RESOURCE_PRIVILEGES

Quasi-real-time view

TABLE_LABELS

Quasi-real-time view

COLUMN_LABELS

Quasi-real-time view

TABLE_LABEL_GRANTS

Quasi-real-time view

COLUMN_LABEL_GRANTS

Quasi-real-time view

Usage information

TASKS

Real-time snapshots of running jobs

Online data is displayed in metadata views with a delay of a few seconds. This view is in public preview without SLA guarantee and will be available in the future.

TASKS_HISTORY

Quasi-real-time view. Historical data is stored in a partitioned table, and data from the last 14 days is retained.

Online data is displayed in metadata views with a delay of approximately 3 hours.

TUNNELS_HISTORY

Quasi-real-time view. Historical data is stored in a partitioned table, and data from the last 14 days is retained.

TABLES

Displays the information about a table in a project.

Field

Data type

Description

table_catalog

STRING

The value is fixed to odps.

table_schema

STRING

The name of the project.

table_name

STRING

The name of the table.

table_type

STRING

The type of the table. Valid values:

  • MANAGED_TABLE

  • VIRTUAL_VIEW

  • EXTERNAL_TABLE

is_partitioned

BOOLEAN

Specifies whether the table is a partitioned table.

owner_id

STRING

The ID of the table owner.

owner_name

STRING

Optional. The Alibaba Cloud account of the table owner.

create_time

DATETIME

The time when the data table was created.

last_modified_time

DATETIME

The time when the table data was last modified.

data_length

BIGINT

If the table is a non-partitioned table, the value of this parameter is the size of the table data. If the table is a partitioned table, the system does not calculate the size of the table data. In this case, the value of this parameter is NULL. The PARTITIONS view includes the data size of each partition in a partitioned table. Unit: bytes.

table_comment

STRING

The comments on the table.

life_cycle

BIGINT

Optional. The lifecycle of the table.

is_archived

BOOLEAN

A reserved field.

table_exstore_type

STRING

A reserved field.

cluster_type

STRING

The clustering type of the MaxCompute table. Valid values: HASH and RANGE.

number_buckets

BIGINT

Optional. The number of buckets in the clustered table. The value 0 indicates that the number of buckets dynamically changes during job execution.

view_original_text

STRING

The view text in the table of the VIRTUAL_VIEW type.

PARTITIONS

Displays information about a table partition in a project.

Field

Data type

Description

table_catalog

STRING

The value is fixed to odps.

table_schema

STRING

The name of the project.

table_name

STRING

The name of the table.

partition_name

STRING

The name of the partition. Example: ds='20190130'.

create_time

DATETIME

The time when the partition was created.

last_modified_time

DATETIME

The time when the table was last modified.

data_length

BIGINT

The size of the data in the partition. Unit: bytes.

is_archived

BOOLEAN

A reserved field.

is_exstore

BOOLEAN

A reserved field.

cluster_type

STRING

Optional. The clustering type of the MaxCompute table. Valid values: HASH and RANGE.

number_buckets

BIGINT

Optional. The number of buckets in the clustered table. The value 0 indicates that the number of buckets dynamically changes during job execution.

COLUMNS

Displays information about a table column in a project.

Field

Data type

Description

table_catalog

STRING

The value is fixed to odps.

table_schema

STRING

The name of the project.

table_name

STRING

The name of the table.

column_name

STRING

The name of the column.

ordinal_position

BIGINT

The serial number of the column.

column_default

STRING

The default value of the column.

is_nullable

BOOLEAN

Optional. The value is fixed to YES.

data_type

STRING

The data type of the column.

column_comment

STRING

The comments on the column.

is_partition_key

BOOLEAN

Specifies whether the column is a partition key.

UDFS

Displays information about a user-defined function (UDF) in a project.

Field

Data type

Description

udf_catalog

STRING

The value is fixed to odps.

udf_schema

STRING

The name of the project.

udf_name

STRING

The name of the UDF.

owner_id

STRING

The ID of the UDF owner.

owner_name

STRING

Optional. The Alibaba Cloud account of the UDF owner.

create_time

DATETIME

The time when the UDF was created.

last_modified_time

DATETIME

The last time when the UDF was modified.

RESOURCES

Displays information about a resource in a project.

Field

Data type

Description

resource_catalog

STRING

The value is fixed to odps.

resource_schema

STRING

The name of the project.

resource_name

STRING

The name of the resource.

resource_type

STRING

The type of the resource. Valid values: Py and Jar.

owner_id

STRING

The ID of the resource owner.

owner_name

STRING

Optional. The Alibaba Cloud account of the resource owner.

create_time

DATETIME

The time when the resource was created.

last_modified_time

DATETIME

The time when the resource was last modified.

size

BIGINT

The storage space used by the resource.

comment

STRING

The comments on the resource.

is_temp_resource

BOOLEAN

Specifies whether the resource is a temporary resource.

UDF_RESOURCES

Displays information about the dependent resource of a UDF in a project.

Field

Data type

Description

udf_catalog

STRING

The value is fixed to odps.

udf_schema

STRING

The name of the project.

udf_name

STRING

The name of the UDF.

resource_schema

STRING

The name of the project to which the resource belongs.

resource_name

STRING

The name of the resource.

USERS

Displays the list of users in a project.

Field

Data type

Description

user_catalog

STRING

Valid values: ALIYUN and RAM.

user_schema

STRING

The name of the project.

user_name

STRING

Optional. The name of the user.

user_id

STRING

The ID of the Alibaba Cloud user.

user_label

STRING

The label of the user.

ROLES

Displays the list of roles in a project.

Field

Data type

Description

role_catalog

STRING

The value is fixed to odps.

role_schema

STRING

The name of the project.

role_name

STRING

The name of the role.

role_label

STRING

The label of the role.

comment

STRING

The comments on the role.

USER_ROLES

Displays information about a role that a user assumes in a project.

Field

Data type

Description

user_role_catalog

STRING

The value is fixed to odps.

user_role_schema

STRING

The name of the project.

role_name

STRING

The name of the role.

user_name

STRING

The name of the user.

user_id

STRING

The ID of the user.

PACKAGE_OBJECTS

Displays the object information of a package in a project.

Field

Data type

Description

package_catalog

STRING

The value is fixed to odps.

package_schema

STRING

The name of the project.

package_name

STRING

The name of the package.

object_type

STRING

The type of the package object.

object_name

STRING

The name of the package object.

column_name

STRING

The name of the table column.

allowed_privileges

VECTOR<STRING>

The shared permissions.

allowed_label

STRING

The shared label.

INSTALLED_PACKAGES

Displays information about an installed package in a project.

Field

Data type

Description

installed_package_catalog

STRING

The value is fixed to odps.

installed_package_schema

STRING

The name of the project.

package_project

STRING

The name of the project in which the package was created.

package_name

STRING

The name of the package.

installed_time

DATETIME

Reserved. The time when the package was installed.

allowed_label

STRING

The shared label.

SCHEMA_PRIVILEGES

Displays information about a schema permission in a project.

Field

Data type

Description

user_catalog

STRING

The value is fixed to odps.

user_schema

STRING

The name of the project.

grantee

STRING

The name of the user.

user_id

STRING

The ID of the user.

grantor

STRING

The account that grants the permission. The current value is NULL.

privilege_type

STRING

The type of the permission.

TABLE_PRIVILEGES

Displays information about a table permission in a project.

Field

Data type

Description

table_catalog

STRING

The value is fixed to odps.

table_schema

STRING

The name of the project to which the table belongs.

table_name

STRING

The name of the table.

grantee

STRING

The name of the user.

user_id

STRING

The ID of the user.

grantor

STRING

The account that grants the permission. The current value is NULL.

privilege_type

STRING

The type of the permission.

user_schema

STRING

The name of the project to which the user belongs.

COLUMN_PRIVILEGES

Displays information about a column permission in a project.

Field

Data type

Description

table_catalog

STRING

The value is fixed to odps.

table_schema

STRING

The name of the project to which the table belongs.

table_name

STRING

The name of the table.

column_name

STRING

The name of the column.

grantee

STRING

The name of the user.

user_id

STRING

The ID of the user.

grantor

STRING

Optional. The current value is NULL.

privilege_type

STRING

The type of the permission.

user_schema

STRING

The name of the project to which the user belongs.

UDF_PRIVILEGES

Displays information about a UDF permission in a project.

Field

Data type

Description

udf_catalog

STRING

The value is fixed to odps.

udf_schema

STRING

The name of the project.

udf_name

STRING

The name of the UDF.

user_schema

STRING

The name of the project to which the user belongs.

grantee

STRING

The name of the user.

user_id

STRING

The ID of the user.

grantor

STRING

The account that grants the permission. The current value is NULL.

privilege_type

STRING

The type of the permission.

RESOURCE_PRIVILEGES

Displays information about a resource permission in a project.

Field

Data type

Description

resource_catalog

STRING

The value is fixed to odps.

resource_schema

STRING

The name of the project.

resource_name

STRING

The name of the resource.

user_schema

STRING

The name of the project to which the user belongs.

grantee

STRING

The name of the user.

user_id

STRING

The ID of the user.

grantor

STRING

The account that grants the permission. The current value is NULL.

privilege_type

STRING

The type of the permission.

TABLE_LABELS

Displays information about a table label in a project.

Field

Data type

Description

table_catalog

STRING

The value is fixed to odps.

table_schema

STRING

The name of the project.

table_name

STRING

The name of the table.

label_type

STRING

The type of the label. The value is fixed to NULL.

label_level

STRING

The level of the label.

COLUMN_LABELS

Displays information about a table column label in a project.

Field

Data type

Description

table_catalog

STRING

The value is fixed to odps.

table_schema

STRING

The name of the project.

table_name

STRING

The name of the table.

column_name

STRING

The name of the column.

label_type

STRING

The type of the label. The value is fixed to NULL.

label_level

STRING

The level of the label.

TABLE_LABEL_GRANTS

Displays the authorization information of a table label in a project.

Field

Data type

Description

table_label_grant_catalog

STRING

The value is fixed to odps.

table_label_grant_schema

STRING

The name of the project to which the user belongs.

user

STRING

The name of the user.

user_id

STRING

The ID of the user.

table_schema

STRING

The name of the project to which the table belongs.

table_name

STRING

The name of the table.

grantor

STRING

The account that grants the permission. The current value is NULL.

label_level

STRING

The granted level of the label.

expired

DATETIME

The time when the authorization expires.

COLUMN_LABEL_GRANTS

Displays the authorization information of a table column label in a project.

Field

Data type

Description

column_label_grant_catalog

STRING

The value is fixed to odps.

column_label_grant_schema

STRING

The name of the project to which the user belongs.

user

STRING

The name of the user.

user_id

STRING

The ID of the user.

table_schema

STRING

The name of the project to which the table belongs.

table_name

STRING

The name of the table.

column_name

STRING

The name of the column.

grantor

STRING

The account that grants the permission. The current value is NULL.

label_level

STRING

The granted level of the label.

expired

DATETIME

The time when the authorization expires.

TASKS

Displays the real-time snapshots of jobs. This view is used to monitor jobs in real time.

Important

The TASKS view is in the internal testing process and its fields and field content may be changed. This view has no SLA guarantee. Use this view with caution. For more information about the subsequent changes of the release status, see Service notices.

Field

Data type

Description

project_name

STRING

The name of the project.

task_name

STRING

The name of the job.

task_type

STRING

The type of the job.

Valid values:

  • SQL: SQL jobs

  • CUPID: Spark or Mars jobs

  • SQLCost: SQL cost estimation jobs

  • SQLRT: MaxCompute Query Acceleration (MCQA) SQL jobs

  • LOT: MapReduce jobs

  • PS: Parameter Server of PAI

  • AlgoTask: Machine Learning Platform for AI (PAI) jobs

inst_id

STRING

The ID of the instance that is created for the job.

status

STRING

The status of the job when data is collected. Valid values: Running and Waiting.

owner_id

STRING

The ID of the Alibaba Cloud account that submits the job.

owner_name

STRING

The name of the Alibaba Cloud account that submits the job.

start_time

DATETIME

The time when the job starts.

priority

BIGINT

The priority of the job. This parameter is applicable only to jobs that use subscription resources.

signature

STRING

The job signature.

queue_name

STRING

The name of the compute queue.

cpu_usage

BIGINT

The current CPU utilization. The value of this field is calculated by using the following formula: Number of CPU cores × 100.

mem_usage

BIGINT

The current memory usage. Unit: MB.

gpu_usage

BIGINT

The current GPU usage. The value of this field is calculated by using the following formula: Number of GPUs × 100.

total_cpu_usage

BIGINT

The accumulated CPU utilization. The value of this field is calculated by using the following formula: Number of CPU cores × 100 × Running duration of the job (seconds).

total_mem_usage

BIGINT

The accumulated memory usage. The value of this field is calculated by using the following formula: Memory size (MB) × Running duration of the job (seconds).

total_gpu_usage

BIGINT

The accumulated GPU usage. The value of this field is calculated by using the following formula: Number of GPUs × 100 × Running duration of the job (seconds).

cpu_min_ratio

BIGINT

The ratio of the CPU utilization of the job to the total CPU utilization. This parameter is applicable only to jobs that use subscription resources.

mem_min_ratio

BIGINT

The ratio of the memory consumed by the job to the total memory usage. This parameter is applicable only to jobs that use subscription resources.

gpu_min_ratio

BIGINT

The ratio of the GPUs consumed by the job to the total GPU usage. This parameter is applicable only to jobs that use subscription resources.

cpu_max_ratio

BIGINT

The ratio of the CPU utilization of the job to the maximum CPU utilization. This parameter is applicable only to jobs that use subscription resources.

mem_max_ratio

BIGINT

The ratio of the memory consumed by the job to the maximum memory usage. This parameter is applicable only to jobs that use subscription resources.

gpu_max_ratio

BIGINT

The ratio of the GPUs consumed by the job to the maximum GPU usage. This parameter is applicable only to jobs that use subscription resources.

settings

STRING

The custom scheduling settings of an upper-layer application, such as DataWorks.

additional_info

STRING

The additional information. This is a reserved field.

TASKS_HISTORY

Displays the job execution history in a MaxCompute project. Data from the last 14 days is retained.

Field

Data type

Description

task_catalog

STRING

The value is fixed to odps.

task_schema

STRING

The name of the project.

task_name

STRING

The name of the job.

task_type

STRING

The type of the job.

Valid values:

  • SQL: SQL jobs

  • CUPID: Spark or Mars jobs

  • SQLCost: SQL cost estimation jobs

  • SQLRT: MaxCompute Query Acceleration (MCQA) SQL jobs

  • LOT: MapReduce jobs

  • PS: Parameter Server of PAI

  • AlgoTask: Machine Learning Platform for AI (PAI) jobs

inst_id

STRING

The ID of the instance.

status

STRING

The status of the job when data is collected. This is not a real-time state. Valid values:

  • Terminated: The job is complete.

  • Failed: The job fails.

  • Cancelled: The job is canceled.

owner_id

STRING

The ID of the Alibaba Cloud account.

owner_name

STRING

The name of the Alibaba Cloud account.

result

STRING

The error information displayed if an error occurs in an SQL job.

start_time

DATETIME

The time when the job starts.

end_time

DATETIME

The end time of the job. If the job does not end on the current day, this value is NULL.

input_records

BIGINT

The number of records read by the job.

output_records

BIGINT

The number of records generated by the job.

input_bytes

BIGINT

The amount of scanned data, which is the same as that displayed on LogView.

output_bytes

BIGINT

The number of output bytes.

input_tables

STRING

The job input tables in the [project.table1,project.table2] format. Some jobs, such as SQL COST jobs, do not have such information.

output_tables

STRING

The job output tables in the [project.table1,project.table2] format.

operation_text

STRING

The source XML file of the query statement. If the size of the source XML file exceeds 256 KB, set the value to NULL.

signature

STRING

Optional. The job signature.

complexity

DOUBLE

Optional. The job complexity. This parameter is available only for SQL jobs.

cost_cpu

DOUBLE

The CPU utilization of the job. The value 100 indicates that 1 CPU core multiplies the job running duration in seconds. For example, if 10 CPU cores run for five seconds, cost_cpu is 5000, which is calculated by using the following formula: 10 × 100 × 5.

cost_mem

DOUBLE

The memory consumed by the job. The value of this field is calculated by using the following formula: Memory size (MB) × Running duration of the job (seconds).

settings

STRING

The information that is scheduled by the upper layer application or specified by users. The information is saved in the JSON format. The information includes the following fields: USERAGENT, BIZID, SKYNET_ID, and SKYNET_NODENAME.

ds

STRING

The date when the data was collected. Example: 20190101.

TUNNELS_HISTORY

Displays historical data that is uploaded and downloaded at the same time over a data tunnel. Data of the previous 14 days is retained.

Field

Data type

Description

tunnel_catalog

STRING

The value is fixed to odps.

tunnel_schema

STRING

The name of the project.

session_id

STRING

The session ID, which is saved in the format of TIMESTAMP (YYYYMMDDHHmmss, 14 characters) + IP address (8 characters) + numHex (8 characters) Example: 2013060414484474e5e60a00000002.

operate_type

STRING

The type of the operation. Valid values:

  • UPLOADLOG

  • DOWNLOADLOG

  • DOWNLOADINSTANCELOG

tunnel_type

STRING

The type of the tunnel. Valid values: TUNNEL LOG and TUNNEL INSTANCE LOG.

request_id

STRING

The ID of the request.

object_type

STRING

The type of object on which the operation is performed. Valid values: TABLE and INSTANCE.

object_name

STRING

The table name or instance ID.

partition_spec

STRING

The partition information. Example: time=20130222,loc=beijing

data_size

BIGINT

The size of data. Unit: bytes.

block_id

BIGINT

The ID of the block uploaded by using the tunnel. This parameter is available only if operate_type is set to UPLOADLOG. Otherwise, this parameter is left empty.

offset

BIGINT

The number of records to skip before data is downloaded. By default, the download starts from record 0.

length

BIGINT

The number of records to download or upload in the current session. The number of downloaded records is equal to the value of this parameter.

owner_id

STRING

The ID of the Alibaba Cloud account.

owner_name

STRING

The name of the Alibaba Cloud account.

start_time

DATETIME

The start time of the request.

end_time

DATETIME

The end time of the request.

client_ip

STRING

The IP address of the client that initiates the request.

user_agent

STRING

The information about the user agent, which is the client that initiates the request. The information may be the Java version or the operating system.

columns

STRING

The columns that are specified when the data is downloaded over a data tunnel.

ds

STRING

The date when the data was collected. Example: 20190101.