All Products
Search
Document Center

Hologres:Auto Load

Last Updated:Sep 11, 2024

This topic describes how to use the Auto Load feature to automatically create foreign tables for MaxCompute tables and Object Storage Service (OSS) tables in on-demand load or full load mode based on your business requirements.

Scenarios

Hologres is deeply integrated with MaxCompute, Data Lake Formation (DLF), and OSS. You can query data in MaxCompute or OSS by using foreign tables in Hologres without the need to migrate the data from MaxCompute or OSS to Hologres. If you want to accelerate queries on a large number of tables, you can use the Auto Load feature to automatically create foreign tables in Hologres. MaxCompute and DLF metadata is automatically synchronized to the foreign tables. This eliminates the need to manually create foreign tables.

  • On-demand load: This mode is suitable for scenarios in which you want to accelerate queries on a small number of tables. When this mode is enabled and a query is initiated, Hologres automatically creates a foreign table for the MaxCompute or OSS table from which you want to query data to accelerate the query. The foreign table is named the same as the MaxCompute or OSS table.

    Note
    • If an internal table in Hologres has the same schema name and table name as the queried MaxCompute or OSS table, the Auto Load feature is not triggered and data is queried from the internal table.

    • To allow your queries to trigger the creation of foreign tables after the Auto Load feature is enabled, you must make sure that the account used to initiate the queries has the permissions to create and delete tables and schemas in the specific database. If a foreign table is created by using the Auto Load feature, you need to only make sure that the account has the query permission.

    • This mode is triggered only when you initiate queries. This mode does not periodically create foreign tables.

  • Full load: This mode is suitable for scenarios in which you want to accelerate queries on a large number of tables or tables from a large number of sources. After this mode is enabled, the system automatically creates foreign tables for all source tables. You can also configure a parameter to specify an interval at which new source tables are inspected. This way, the system automatically creates foreign tables for the new source tables when you initiate a query. This helps optimize the management of a large number of foreign tables and is suitable for BI-based query acceleration.

自动加载

Description

  • In Hologres V1.1.43 and later, the Auto Load feature supports MaxCompute projects that use the two-layer model.

  • In Hologres V1.3.28 and later, the default value of the hg_experimental_load_all_foreign_table_interval_time parameter is changed from 5 min to 30 min to reduce the consumption of system resources caused by automatic inspection tasks. This parameter specifies the interval at which new source tables are inspected when the full load mode is used.

  • In Hologres V2.2.1 and later, the following features are supported. If the version of your Hologres instance is V2.1 or earlier, contact Hologres technical support to upgrade your instance.

    • The hg_experimental_load_foreign_table_mode = ['query' | 'period'] configuration is supported to allow you to specify the mode in which the Auto Load feature is implemented.

    • The Auto Load feature supports MaxCompute projects that use the three-layer model. For more information about the three-layer model of MaxCompute, see Schema-related operations.

    • The Auto Load feature supports metadata updates based on schema evolution of MaxCompute tables. Schema evolution includes operations such as adding columns, deleting columns, changing the names of columns, and changing the order of columns.

      If a foreign table is created for a MaxCompute table, and schema evolution is performed on the MaxCompute table, you can configure the GUC parameter to enable the schema evolution check. After you configure the GUC parameter, the system automatically updates metadata in the foreign table that maps to the MaxCompute table when a query is initiated. We recommend that you configure the GUC parameter for sessions but not databases. Otherwise, a large number of tasks for checking table schema changes are generated. As a result, system workloads increase.

      set hg_experimental_enable_auto_load_check_schema_evolution = on;
      Note

      The Auto Load feature does not support metadata updates based on schema evolution of OSS tables.

    • The Auto Load feature supports automatic load of metadata in DLF to accelerate queries from tables in OSS. For more information, see Use DLF to read data from and write data to OSS.

Limits

  • Make sure that the parameters of the MaxCompute and OSS data sources do not start with hg_ or holo_. The prefixes are reserved keywords in Hologres.

  • Up to six foreign tables can be automatically created for a single query. Therefore, you can execute an SQL statement to query up to six tables. If more than six tables are specified in a query statement, the Auto Load feature fails. You must modify the query statement for the Auto Load feature to take effect.

  • If a MaxCompute table contains data whose types are not supported by Hologres, the system fails to create a foreign table for the MaxCompute table. You can only manually select supported fields to create a foreign table by executing the CREATE FOREIGN TABLE statement.

  • To use the Auto Load feature to automatically create foreign tables for OSS tables, you must execute the CREATE USER MAPPING statement to specify the AccessKey ID and AccessKey secret used to access DLF and OSS for your accounts, including your Alibaba Cloud account.

  • If multiple foreign tables that have the same name are created in the same schema in Hologres for tables in different data sources, only the latest table is retained. Therefore, we recommend that you associate different data sources with different schemas in Hologres.

  • If you use the Auto Load feature to automatically create foreign tables and the specified schema does not exist in Hologres, you must be granted the admin permissions on the database to allow the system to automatically create the schema.

  • When you use the full load mode, take note of the following items:

    • We recommend that you set the interval at which foreign tables are created to at least 5 minutes.

    • We recommend that you do not use this mode for a data source that contains more than 1,000 tables.

Procedure

Enable Auto Load

  • Syntax

    -- Enable the Auto Load feature.
    ALTER DATABASE <database name> SET hg_experimental_enable_auto_load_foreign_table = on;
  • Parameters

    database name: the name of the Hologres database.

(Optional) Configure the Auto Load mode

In Hologres V2.2.1 and later, you can configure the Auto Load mode based on your business requirements. If you do not specify a mode, the on-demand load mode is used. You do not need to specify the mode for versions earlier than Hologres V2.2.1.

  • Syntax

    SET hg_experimental_load_foreign_table_mode = ['query' | 'period']
  • Valid values

    • query: The on-demand load mode is used. This is the default value.

      -- Specify the on-demand load mode.
      SET hg_experimental_load_foreign_table_mode = 'query';
    • period: The full load mode is used.

      -- Specify the full load mode.
      SET hg_experimental_load_foreign_table_mode = 'period';

Configure mappings

  • Syntax

    • Configure mappings between MaxCompute and Hologres

      • Hologres V2.2.1 and later

        -- Syntax for MaxCompute projects that use the three-layer model
        ALTER DATABASE <database> SET hg_experimental_auto_load_foreign_schema_mapping = '<hologres_schema>:<odps_foreign_server>.<mc_project>.<mc_schema>, [...]';
        -- Syntax for MaxCompute projects that use the two-layer model
        ALTER DATABASE <database> SET hg_experimental_auto_load_foreign_schema_mapping = '<hologres_schema>:<odps_foreign_server>.<mc_project>, [...]';
      • Versions earlier than Hologres V2.2.1

        ALTER DATABASE <database> SET hg_experimental_default_odps_project_list='<odps_project_name_1>,<odps_project_name_2>...';
    • Configure mappings between DLF and Hologres

      --DLF Default Catalog
      ALTER DATABASE <database> SET hg_experimental_auto_load_foreign_schema_mapping = '<hologres_schema>:<dlf_foreign_server>.<dlf_database> [...]';
      -- Syntax for DLF custom catalogs
      ALTER DATABASE <database> SET hg_experimental_auto_load_foreign_schema_mapping = '<hologres_schema>:<dlf_foreign_server>.<dlf_catalog>.<dlf_database>, [...]';
  • Parameters

    Parameter

    Description

    database

    The name of the Hologres database.

    hologres_schema

    The name of the schema in the Hologres database.

    mc_project

    The name of the MaxCompute project for which you want to enable the Auto Load feature. You can log on to the MaxCompute console and choose Workspaces > Projects in the left-side navigation pane to obtain the name of the MaxCompute project.

    odps_foreign_server

    The server that is used to access the MaxCompute project. The value is fixed as odps_server.

    odps_project_name

    The name of the MaxCompute project for which you want to periodically create foreign tables. You can specify multiple projects. Separate multiple project names with commas (,).

    By default, this parameter is left empty, which indicates that foreign tables are not periodically created for a MaxCompute project.

    Note

    This parameter is valid only for versions earlier than Hologres V2.2.1.

    mc_schema

    The name of the schema in the MaxCompute database.

    Note

    This parameter is required only for projects that use the three-layer model. For more information about schemas, see Schema-related operations.

    dlf_foreign_server

    The foreign server that is created based on DLF_FDW. You can execute the following statement to obtain the name of the foreign server. For more information, see Use DLF to read data from and write data to OSS.

    SELECT * FROM pg_foreign_server;

    dlf_catalog

    The name of the catalog in which the DLF database resides. To obtain the catalog name, you can log on to the DLF console, choose Metadata > Metadata in the left-side navigation pane, and then click the Catalog List tab.

    dlf_database

    The name of the DLF database for which you want to enable the Auto Load feature. To obtain the database name, you can log on to the DLF console, choose Metadata > Metadata in the left-side navigation pane, and then click the Database tab.

Query data

You can query data from MaxCompute and DLF by using foreign tables.

-- Query data from MaxCompute.
SELECT * FROM <hologres_schema>.<mc_table>;
-- Query data from DLF.
SELECT * FROM <hologres_schema>.<dlf_table>;

Other operations

Disable Auto Load

  • Syntax

    ---- Disable the Auto Load feature.
    ALTER DATABASE <database name> SET hg_experimental_enable_auto_load_foreign_table = off;
    Note

    After you disable the Auto Load feature, the system no longer synchronizes metadata from data sources to create foreign tables. You can view existing foreign tables and metadata only in Hologres. If you want to create foreign tables in this case, you must manually execute the IMPORT FOREIGN SCHEMA statement. For more information, see IMPORT FOREIGN SCHEMA.

  • Parameters

    database name: the name of the Hologres database.

View data sources for which the Auto Load feature is enabled

-- View MaxCompute data sources for which the Auto Load feature is enabled in versions earlier than Hologres V2.2.
SHOW hg_experimental_default_odps_project_list;

-- View MaxCompute and DLF data sources for which the Auto Load feature is enabled in Hologres V2.2 and later.
SHOW hg_experimental_auto_load_foreign_schema_mapping;

Configure the interval for periodic inspections

If you configure the full load mode for a data source, the system inspects new tables at the specified interval. If new tables are created, the system automatically creates foreign tables for the new tables when you initiate a query. The default interval is 30 minutes. If new tables are created during the 30 minute-interval before you initiate a query, the system automatically creates foreign tables for the new tables.

  • Syntax

    -- Set the inspection interval to 600 seconds for a Hologres database.
    ALTER DATABASE <database name> SET hg_experimental_load_all_foreign_table_interval_time = 600;
  • Parameters

    database name: the name of the Hologres database.

Note

We recommend that you set the inspection interval to be greater than or equal to 600 seconds (10 minutes).

Drop foreign tables

If foreign tables are no longer required, you can execute the DROP statement to drop the foreign tables. For more information, see DROP FOREIGN TABLE.

Examples

Example 1: Use the on-demand load mode

Note

In this example, the Auto Load feature is enabled for a MaxCompute project that uses the two-layer model and a MaxCompute project that uses three-layer model.

  • Two-layer model: The project name is mc_project and the table name is mc_table.

  • Three-layer model: The project name is mc_3_layer_project, the schema name is mc_schema, and the table name is mc_table.

  • Two-layer model

    1. Enable the Auto Load feature.

      ALTER DATABASE holo_demo SET hg_experimental_enable_auto_load_foreign_table = on;
    2. Configure mappings.

      --- For Hologres V2.2.1 and later
      ALTER DATABASE holo_demo SET hg_experimental_auto_load_foreign_schema_mapping = 'hologres_schema:odps_server.mc_project';
      --- For versions earlier than Hologres V2.2.1
      ALTER DATABASE holo_demo SET hg_experimental_default_odps_project_list='mc_project';
    3. Query data by using a foreign table.

      SELECT * FROM hologres_schema.mc_table;
  • Three-layer model

    1. Enable the Auto Load feature.

      ALTER DATABASE holo_demo SET hg_experimental_enable_auto_load_foreign_table = on;
    2. Configure mappings.

      ALTER DATABASE holo_demo SET hg_experimental_auto_load_foreign_schema_mapping = 'holo_schema_3layer:odps_server.mc_3_layer_project.mc_schema';
    3. Query data by using a foreign table.

      SELECT * FROM holo_schema_3layer.mc_table;

Example 2: Use the full load mode

In this example, a DLF data source is used. You can enable the Auto Load feature for tables in the DLF default catalog or a custom catalog.

  • DLF default catalog

    This section describes how to use the full load mode for the dlf_db database in the DLF default catalog. The system automatically creates foreign tables in Hologres for all tables in the database. The system also periodically inspects new tables in the database at the interval specified by the hg_experimental_load_all_foreign_table_interval_time parameter and automatically creates foreign tables for new tables in the database.

    1. Create a foreign server named dlf_server and configure the endpoint. For more information, see Use DLF to read data from and write data to OSS.

      -- Create a foreign server.
      CREATE SERVER IF NOT EXISTS dlf_server FOREIGN data wrapper dlf_fdw options (
          dlf_endpoint 'dlf-share.cn-beijing.aliyuncs.com',
          oss_endpoint 'oss-cn-beijing-internal.aliyuncs.com'
      );
    2. Configure user mappings.

      To use the Auto Load feature, you must execute the CREATE USER MAPPING statement to specify the AccessKey ID and AccessKey secret used to access DLF and OSS. For more information, see Use DLF to read data from and write data to OSS.

      CREATE USER MAPPING FOR <User ID> server <server_name> options
      (
          dlf_access_id 'accessid', 
          dlf_access_key 'accesskey',
          oss_access_id 'accessid', 
          oss_access_key 'accesskey'
      );
    3. Enable the Auto Load feature.

      ALTER DATABASE holo_demo SET hg_experimental_enable_auto_load_foreign_table = on;
    4. Configure mappings and the Auto Load mode.

      SET hg_experimental_auto_load_foreign_schema_mapping = 'holo_schema:dlf_server.dlf_db';
      SET hg_experimental_load_foreign_table_mode = 'period'; 
      SET hg_experimental_load_all_foreign_table_interval_time = 10;
    5. Query data by using the foreign table.

      SELECT * FROM holo_schema.dlf_table;
  • DLF custom catalog

    1. Create a foreign server and configure the endpoint. For more information, see Use DLF to read data from and write data to OSS.

      CREATE SERVER IF NOT EXISTS DLF_server FOREIGN data wrapper dlf_fdw options (
          dlf_endpoint 'dlf-share.cn-beijing.aliyuncs.com',
          oss_endpoint 'oss-cn-beijing-internal.aliyuncs.com'
      );
    2. Configure user mappings.

      To use the Auto Load feature, you must execute the CREATE USER MAPPING statement to specify the AccessKey ID and AccessKey secret used to access DLF and OSS. For more information, see Use DLF to read data from and write data to OSS.

      CREATE USER mapping FOR <User ID> server <server_name> options
      (
          dlf_access_id 'accessid', 
          dlf_access_key 'accesskey',
          oss_access_id 'accessid', 
          oss_access_key 'accesskey'
      );
    3. Enable the Auto Load feature.

      ALTER DATABASE holo_demo SET hg_experimental_enable_auto_load_foreign_table = on;
    4. Configure mappings, the Auto Load mode, and the inspection interval.

      -- Configure mappings.
      SET hg_experimental_auto_load_foreign_schema_mapping = 'holo_schema:DLF_server.dlf_catalog.dlf_db';
      -- Configure the full load mode.
      SET hg_experimental_load_foreign_table_mode = 'period'; 
      -- Set the inspection interval to 600 seconds.
      SET hg_experimental_load_all_foreign_table_interval_time = 600;
    5. Query data by using the foreign table.

      SELECT * FROM holo_schema.dlf_table;