All Products
Search
Document Center

MaxCompute:Schema-related operations

Last Updated:Dec 17, 2024

MaxCompute provides the schema feature. You can use schemas to classify tables, resources, and functions in projects. This topic describes the concepts and permissions that are related to schemas. This topic also describes how to use schemas.

Background information

A MaxCompute project is a basic organizational unit of MaxCompute and is used for isolation and access control of multiple users. A project contains objects, such as tables, resources, and functions. Before the schema feature is introduced in MaxCompute, the objects are directly placed in projects. Projects serve as databases or schemas based on the hierarchy of traditional databases. This causes operation inconvenience for users, especially when a large number of tables or other objects exist. To resolve this issue, MaxCompute provides the schema feature. You can use schemas to classify tables, resources, and functions in projects. The following figure shows the hierarchy of schemas.层级图

Key terms

  • Schema

    A schema is created in a project. You can use schemas to classify objects in projects, such as tables, resources, and user-defined functions (UDFs). You can create multiple schemas in a project.

  • Schema syntax

    To use the schema feature, you must identify the semantics of the project.schema.table in terms of the syntax. You can perform the identification operation by enabling the schema syntax. If you enable the schema syntax, the system can identify statements in the a.b.c format as project.schema.table, and identify statements in the a.b format as schema.table. If you disable the schema syntax, the system cannot identify statements in the a.b.c format, and can identify statements in the a.b format as project.table. You can enable the schema syntax at the tenant or job level.

    • Tenant-level settings determine the default semantics of data access requests submitted by a tenant. You can log on to the MaxCompute console, and choose Tenants > Tenant Property to view the status of the schema syntax. If the current tenant does not have projects, or if the number of projects does not exceed 10 and all have already been upgraded to support Schema as well as assessed for impact, you can turn on the Tenant-level Information Schema Syntax switch on the Tenants page. If the current tenant has more than 10 projects, the switch cannot be turned on. We also recommend that you do not turn on the switch. For more information about how to enable the tenant-level schema syntax, see Tenant properties.

    • Job-level settings affect only the semantics of the current job and take precedence over tenant-level settings. You can run the set odps.namespace.schema=true | false; command to enable or disable the schema syntax at the job level.

  • Default schema

    A schema named DEFAULT exists in each project for which the schema feature is enabled. This schema cannot be modified or deleted.

Limits

The MaxCompute schema feature is continuously improved. Specific modules do not support this feature. The operations that are initiated from these modules on MaxCompute schemas can be performed only after you run the set odps.namespace.schema=false command. Take note of the following limits:

  • The schema feature is supported for Spark jobs in MaxCompute only if spark.hadoop.odps.spark.version is set to spark-3.1.1-odps0.35.0 or a later version. You must configure the following parameters for Spark jobs:

    spark.hadoop.odps.spark.version=spark-3.1.1-odps0.35.0
    spark.hadoop.odps.spark.default.enable=false
    spark.sql.catalog.odps.enableNamespaceSchema=true
  • Mars jobs and MapReduce jobs in MaxCompute do not support projects for which the schema feature is enabled. If you enable the schema feature for your project, you cannot run Mars jobs or MapReduce jobs in your project.

  • Alibaba Cloud services such as Platform for AI (PAI) and Quick BI do not support custom schemas.

  • The schema feature of MaxCompute is supported for MaxCompute Studio V4.0.0 or later.

  • Views and UDFs that are created based on the configuration odps.namespace.schema=false can be accessed only when the schema feature is disabled. Similarly, views and UDFs that are created based on the configuration odps.namespace.schema=true can be accessed only when the schema feature is enabled.

  • The schema feature of MaxCompute is supported for Hologres V1.3 or later. For more information about how to upgrade a Hologres instance, see Upgrade instances.

  • MaxCompute SDK 0.40.8 for Java or later, Java Database Connectivity (JDBC) 3.3.2 or later, MaxCompute client V0.40.8 or later, and PyODPS 0.11.3.1 or later support the schema feature of MaxCompute.

  • DataWorks displays schema-related interactions only when the schema syntax is enabled at the tenant level. If only specific projects use custom schemas for storage and the tenant-level schema syntax is disabled, you cannot manage custom schemas in the DataWorks console. In this case, you can use scripts on Data Integration to specify objects in the custom schemas. For more information about the scenarios in which DataWorks supports the schema feature, see Use the MaxCompute schema feature in DataWorks.

Enable the schema feature

If you want to use the schema feature of MaxCompute, you can use one of the following methods to enable the feature:

  • If you are a beginner of MaxCompute and no projects are available, you can log on to the MaxCompute console, choose Tenants > Tenant Property, and then turn on Tenant-level Information Schema Syntax. This way, all the projects that you create later support the schema feature. For all requests, odps.namespace.schema is set to true by default. This setting indicates that the schema feature is enabled.

  • If you have no than 10 existing projects and either no existing jobs or a few jobs that you want to use the schema syntax, you can first upgrade all existing projects to schema mode. Then, log on to the MaxCompute console and choose Tenants > Tenant Property, turn on Tenant-level Information Schema Syntax. After you enable Tenant-level Information Schema Syntax, all new projects support the schema feature. For all requests, odps.namespace.schema is set to true by default.

  • If you have existing projects and jobs, but a new business needs to use the schema feature, you need to upgrade one or more projects to support the schema feature. You can use one of the following upgrade methods based on regions. After the upgrade, a schema named DEFAULT is automatically created in the projects.

    • Alibaba Cloud regions in the Chinese mainland, including China (Hangzhou), China (Shanghai), China (Beijing), China (Zhangjiakou), China (Ulanqab), China (Shenzhen), and China (Chengdu). You can log on to the MaxCompute console and choose Workspace > Projects in the left-side navigation pane. If the project does not support the schema feature, the entries are available in the Actions column, as shown in the following figure. You can perform relevant upgrade operations by using the entries.image

    • Other regions. You can submit a ticket for the application.

    Take note that the tenant-level schema syntax is not enabled in this scenario. To prevent a negative impact on existing jobs, we recommend that you do not enable the tenant-level schema syntax. For all requests, odps.namespace.schema is set to false by default. This indicates that the tenant-level schema syntax is disabled. Therefore, if you want to access data in a custom schema that is created for the upgraded project, you need to enable the job-level schema syntax by running the set odps.namespace.schema=true; command. If multiple projects are associated and the other projects are not upgraded to support the schema feature:

    • If the schema syntax is enabled, the data path of a project that is not upgraded is specified in the projectname.default.tablename format.

    • If the schema syntax is disabled, data in the custom schema of the upgraded project cannot be accessed. In this case, only data in the DEFAULT schema is recognized in the projectname.tablename format.

Usage notes

This section describes how to use schemas.

  • Create a project.

    Before you can use schemas, you must create a project. For more information, see Create a MaxCompute project. If you set odps.namespace.schema to true when you create a project in the MaxCompute console, the project that you create supports the schema feature by default. If you set odps.namespace.schema to false, you must submit an application to enable your project to support the schema feature.

    Note

    After you submit an application, you need to wait for your project to be updated. Project updates are performed on every Monday and Thursday.

  • Manage schemas.

    Manage schemas by using SQL statements

    Use the following SQL statements to manage schemas.

    • View schemas.

      show schemas;
    • Create a schema.

      create schema <schema_name>;

      schema_name specifies the name of the custom schema.

    • View schema information.

      desc schema <schema_name>;

      schema_name specifies the name of the schema.

    • Drop a schema.

      drop schema <schema_name>;

      schema_name specifies the name of the schema.

    Manage schemas in the MaxCompute console

    1. Log on to the MaxCompute console. In the top navigation bar, select a region.

    2. In the left-side navigation pane, choose Workspace > Projects.

    3. On the Projects page, find the desired project and click Manage in the Actions column.

    4. On the Projects page, click Schema.

      Note

      The Schema tab appears only for the projects that support the schema feature.

    5. On the Schema tab, view schemas, create a schema, and drop a schema.

  • Perform operations on an object in a schema.

    If you want to perform operations on an object such as a table, view, resource, or function in a schema, specify the object in the project.schema.table format in SQL statements.

    • When you perform operations on a table in a schema, take note of the following items:

      Note

      All descriptions of tables in this topic also apply to views, resources, and functions.

      • If you want to perform operations on a table across projects, you must specify the table in the project.schema.table format in SQL statements.

      • If you want to perform operations on an object across schemas in a project, you can specify the object in the schema.table format in SQL statements. In this case, a in the a.b format is parsed as a schema and b in the a.b format is parsed as a table. The project is the current project.

      • In the same project, you can execute the use schema <schema_name> statement to specify the current schema. You can use an SQL statement that is similar to select * from a. In this case, a in the statement indicates a table and the current project and the specified schema are automatically obtained.

      • If no schema is specified in the context, a in the SQL statement that is similar to select * from a indicates a table and the schema named DEFAULT and the current project are automatically obtained.

    • Examples

      • Perform operations on an object in a specified schema in projectA.

        • Perform operations on an object in the DEFAULT schema.

          use projectA;
          set odps.namespace.schema=true;-- If you configure this setting at the tenant level, skip this configuration.
          -- Perform operations on the t_a table.
          create table t_a(c1 string,c2 bigint);
          insert into/overwrite table t_a values ('a',1),('b',2),('c',3);
          select * from t_a;
          show tables;
          desc t_a;
          
          tunnel upload <path> t_a[/<pt_spc>];
          tunnel download t_a[/pt_spc] <path>;
          
          -- Perform operations on the res_a.jar resource.
          add jar <path>/res_a.jar ;
          desc resource res_a.jar;
          list resources;
          get resource res_a.jar D:\;
          drop resource res_a.jar;
          
          -- Perform operations on the fun_a function.
           create function fun_a as 'xx' using 'res_a.jar';
           desc function fun_a;
           list functions;
           drop function fun_a;

          path: the path and name of the file. pt_spc: the partitions that are specified in a partitioned table. You must specify the lowest-level partition. The value of this parameter is in the pt_spcpartition_col1=col1_value1, partition_col2=col2_value1...pt_spc format.

        • Perform operations on objects in the custom schemas s_1 and s_2, including cross-schema operations.

          use projectA;
          set odps.namespace.schema=true;-- If you configure this setting at the tenant level, skip this configuration.
          
          -- Perform operations on the t_c table in the s_1 schema.
          use schema s_1;
          create table t_c(c1 string,c2 bigint);
          insert into/overwrite table t_c values ('a',1),('b',2),('c',3);
          select * from t_c;
          show tables;
          drop table t_c;
          
          
          tunnel upload <path> t_c[/<pt_spc>];
          tunnel download t_c[/pt_spc] <path>;
          
          -- Perform operations on the t_d table in the s_2 schema.
          create table s_2.t_d(c1 string,c2 bigint);
          insert into/overwrite table s_2.t_d values ('a',1),('b',2),('c',3);
          select * from s_2.t_d;
          show tables in s_2;
          drop table s_2.t_d;
          
          tunnel upload <path> s_2.t_d[/<pt_spc>];
          tunnel download s_2.t_d[/pt_spc] <path>;
          
          
          -- Perform operations on the res_b.jar resource in the s_1 schema.
          use schema s_1;
          add jar <path>/res_b.jar  ;
          desc resource res_b.jar;
          list resources;
          get resource res_b.jar D:\;
          drop resource res_b.jar;
          
          -- Perform operations on the res_c.jar resource in the s_2 schema.
          add jar xxx ;-- You can add resources only to the current schema or project. You cannot perform this operation across schemas or projects. To perform this operation, you must switch to the s_2 schema.
          -- When you perform operations on resources across schemas or projects, separate resource levels with colons (:).
          desc resource s_2:res_c.jar;
          list resources in s_2;
          get resource s_2:res_c.jar D:\;
          drop resource s_2:res_c.jar;
          
          -- Perform operations on the fun_b function in the s_1 schema.
           use schema s_1;
           create function fun_b as 'xx' using 'res_b.jar'
           desc function fun_b;
           list functions;
           drop function fun_b;
          
           -- Perform operations on the fun_c function in the s_2 schema.
           create function s_2.fun_c as 'xx' using 's_2/resources/res_c.jar'
           drop function s_2.fun_c;
           desc function s_2.fun_c;
           list functions in s_2;
           drop function s_2.fun_c;
      • Perform cross-project operations. For example, perform operations on objects of projectB in projectA.

        use projectA;
        set odps.namespace.schema=true;  -- If you configure this setting at the tenant level, skip this configuration.
        
        -- Perform operations on the t_f table of the s_3 schema in projectB.
        create table projectB.s_3.t_f(c1 string,c2 bigint);
        insert into/overwrite table projectB.s_3.t_f values ('a',1),('b',2),('c',3);
        select * from projectB.s_3.t_f;
        show tables in projectB.s_3;
        desc projectB.s_3.t_f;
        drop table projectB.s_3.t_f;
        
        tunnel upload <path> projectB.s_3.t_f[/<pt_spc>];
        tunnel download projectB.s_3.t_f[/pt_spc] <path>;
        
        -- Perform operations on the res_f.jar resource of the s_3 schema in projectB.
        add jar xxx ;-- You can add resources only to the current schema or project. You cannot perform this operation across schemas or projects. To perform this operation, you must switch to the s_3 schema in projectB.
        -- When you perform operations on resources across schemas or projects, separate resource levels with colons (:).
        desc resource projectB:s_3:res_f.jar;
        list resources in projectB.s_3;
        get resource projectB:s_3:res_f.jar D:\;
        drop resource projectB:s_3:res_f.jar;
        
        -- Perform operations on the fun_f function of the s_3 schema in projectB.
         create function projectB.s_3.fun_f as 'xx' using 'projectB/schemas/s_3/resources/res_f.jar'
         desc function projectB.s_3.fun_f;
         list functions in projectB.s_3;
         drop function projectB.s_3.fun_f;
                                        

Permissions

  • Permissions on schemas

    Some permissions, such as CreateTable, CreateResource, and CreateFunction, cannot be granted at the schema level. Instead, the permissions can be granted only at the project level. After you grant a user the CreateTable, CreateResource, or CreateFunction permission on a project, the user can perform the related operations in all schemas of the project. Schema-level management of these permissions will be available in the future.

    Note
    • By default, the owner of a schema has all access permissions on the schema and the objects in the schema and can manage the access permissions of other users on the schema and the objects in the schema.

    • If you have the CreateTable, CreateResource, and CreateFunction permissions on a project, you also have these permissions on the schemas of the project.

  • Permissions on objects in a schema

    When you perform ACL-based access control on an object in a schema, you must specify the object in the project.schema.table format in SQL statements. The following sample code shows the syntax for authorization. For more information about the permissions on objects such as tables, resources, and functions in a schema, see MaxCompute permissions. You can also grant permissions to users in the new MaxCompute console. For more information, see Manage user permissions in the MaxCompute console.

    -- Grant a role the permissions to manage all tables in a schema.
    GRANT schemaObjectPrivileges ON TABLE <project_name>.<schema_name>.* TO role {rolename};
    -- Revoke the permissions to manage all tables in a schema from a role.
    REVOKE schemaObjectPrivileges ON TABLE <project_name>.<schema_name>.* FROM role {rolename};
    -- Grant a role or a user the permissions to manage a table in a schema.
    GRANT schemaObjectPrivileges ON TABLE <project_name>.<schema_name>.<tablename> TO {role|user} {rolename | USER name};-- Grant a role or a user the permissions to manage a table in a schema.
    -- Revoke the permissions to manage a table in a schema from a role or a user.
    REVOKE schemaObjectPrivileges ON TABLE <project_name>.<schema_name>.<tablename> FROM {role|user} {rolename | USER name};
    -- View the permissions on a table.
    SHOW GRANTS ON TABLE <project_name>.<schema_name>.<tablename>;
    Note

    The GRANT schemaObjectPrivileges ON TABLE <project_name>.<schema_name>.xxx* TO role {rolename}; syntax is not supported.