All Products
Search
Document Center

Hologres:CREATE EXTERNAL TABLE

Last Updated:Dec 18, 2024

In Hologres V3.0 and later, you can create an external table in an external schema that is sourced from DLF 2.0. You can create only external tables in the Apache Paimon format. This topic describes the usage and limits of the CREATE EXTERNAL TABLE statement.

Syntax

Table creation syntax

The following code shows the syntax of the CREATE EXTERNAL TABLE statement:

CREATE EXTERNAL TABLE [IF NOT EXISTS] <ext_db_name>.<ext_schema_name>.<ext_table_name>
(
  [{
    <col_name> <col_type> [<col_constraints> [, ...]] |
    <table_constraints> [, ...]
  }]
)
[LOGICAL PARTITION BY LIST(<col_name> [, ...])]
[
  WITH
  (
    "<property>" = '<value>' [, ...]
  )
];

Parameters in the WITH clause

Parameter

Description

Example

table_format

The table format of the external table that you want to create. Only Apache Paimon tables are supported.

"table_format" = 'paimon'

file_format

The file format of the external table that you want to create. Only the ORC and Parquet formats are supported.

"file_format" = 'orc'

bucket

Data is read from and written to Apache Paimon tables by bucket.

After you configure this parameter, all data in a non-partitioned table or data in each partition of a partitioned table is distributed to different buckets. This allows concurrent data reads from or concurrent data writes to Apache Paimon tables in a job, which improves processing efficiency. For more information, see Data Distribution.

"bucket" = '1'

bucket-key

The key based on which data is distributed to buckets.

"bucket-key"='id'

changelog-producer

To allow downstream consumption in streaming mode, complete changelogs for INSERT, DELETE, and UPDATE operations on Apache Paimon tables are required. A changelog is similar to a binary log in a database. This parameter specifies the method in which changelogs are generated for Apache Paimon tables. For more information, see Changelog Producer.

"changelog-producer"='input'

Other operations

Query the table creation statement

SELECT * FROM hologres.hg_dump_script_external('<ext_db_name>.<ext_schema_name>.<ext_table_name>'); 

Query the column and partition properties of a table

SELECT * FROM hologres.hg_external_columns('<ext_db_name>', '<ext_schema_name>', '<ext_table_name>'); 

Query all tables in an external schema

SELECT * FROM hologres.hg_external_tables ('<ext_db_name>', '<ext_schema_name>');

Refresh the metadata of an external table

REFRESH CACHE FOR External TABLE <ext_db_name>.<ext_schema_name>.<ext_table_name> WITH( cache_level = 'metadata');

Examples

Create a partitioned table in DLF 2.0

In this example, an Apache Paimon append-only table is created.

CREATE EXTERNAL TABLE ext_db_dlf.ext_schema_dlf.ext_par_table_dlf(
    id TEXT,                                        
    created_at BIGINT,                              
    type TEXT,                                      
    actor_id TEXT,                                  
    actor_login TEXT,                               
    repo_id TEXT,                                  
    repo_name TEXT,                                 
    org TEXT,                                       
    org_login TEXT                                
)
LOGICAL PARTITION BY LIST(created_at)
 WITH (
  "file_format" = 'orc',
  "bucket"=6,
  "bucket-key"='id'
);

Create a non-partitioned table in DLF 2.0

In this example, an Apache Paimon primary key table is created.

CREATE EXTERNAL TABLE openlake_win.github_events.gh_event_ods(
    id TEXT,                                        
    created_at BIGINT,                             
    type TEXT,                                      
    actor_id TEXT,                                  
    actor_login TEXT,                               
    repo_id TEXT,                                   
    repo_name TEXT,                                 
    org TEXT,                                       
    org_login TEXT,                                 
    PRIMARY KEY(id)
) WITH (
  "changelog-producer"='input',
  "bucket"=6,
  "bucket-key"='id'
);