All Products
Search
Document Center

DataWorks:HBase data source

Last Updated:Feb 06, 2026

The HBase data source supports both reading from and writing to HBase. This topic describes the synchronization capabilities of the DataWorks HBase data source.

Supported versions

The HBase plugin comes in two types: the standard HBase plugin and the HBase{xx}xsql plugin. The HBase{xx}xsql plugin requires both HBase and Phoenix.

  1. HBase plugin:

    This plugin supports HBase0.94.x, HBase1.1.x, and HBase2.x. It supports both wizard mode and script mode. Use the hbaseVersion parameter to specify the version.

    • If you are using HBase0.94.x, set hbaseVersion to 094x for both the Reader and Writer plugins.

      "reader": {
              "hbaseVersion": "094x"
          }
      "writer": {
              "hbaseVersion": "094x"
          }
    • If you are using HBase1.1.x or HBase2.x, set hbaseVersion to 11x for both the Reader and Writer plugins.

      "reader": {
              "hbaseVersion": "11x"
          }
      "writer": {
              "hbaseVersion": "11x"
          }
      The HBase 1.1.x plugin is compatible with HBase 2.0.
  2. HBase{xx}xsql plugin:

    1. HBase20xsql plugin: Supports HBase2.x and Phoenix5.x. Supports script mode only.

      HBase11xsql plugin: Supports HBase1.1.x and Phoenix5.x. Supports script mode only.

    2. The HBase{xx}xsql Writer plugin lets you import data in bulk into an SQL table (Phoenix) in HBase. Phoenix applies data encoding to rowkeys. Writing data directly with the HBase API requires manual data conversion, a complex and error-prone process. The HBase{xx}xsql Writer plugin simplifies this process, offering a straightforward way to import data into an SQL table.

      Note

      The plugin uses the Phoenix JDBC driver to execute UPSERT statements and write data to the table in batches. Because it operates through this high-level interface, it also synchronously updates the corresponding index table.

Limitations

HBase reader

HBase20xsql reader

HBase11xsql writer

  • You can split a table on one column only: its primary key.

  • When you split a table evenly based on job concurrency, the split column must be an integer or a string.

  • Table names, schema names, and column names are case-sensitive; their casing must match that of the Phoenix table.

  • Because this plugin reads data exclusively through Phoenix QueryServer, you must enable the QueryServer service in Phoenix.

  • You can use this writer only with serverless resource groups for Data Integration (recommended) and exclusive resource groups for Data Integration.

  • The writer does not support importing data with timestamps.

  • The writer supports only tables created with Phoenix. It does not support native HBase tables.

  • The order of columns in the writer must match the order in the reader. The reader's column order dictates the sequence of columns in the output, and the writer's column order dictates the expected sequence of columns in the input. For example:

    • The reader's column order is c1, c2, c3, c4.

    • The writer's column order is x1, x2, x3, x4.

    In this scenario, data from the reader's c1 column is written to the writer's x1 column. If the writer's column order is x1, x2, x4, x3, then data from the reader's c3 column is written to x4, and data from c4 is written to x3.

  • The writer supports importing data into indexed tables and automatically updates all related indexes.

Supported features

HBase Reader

HBase Reader supports normal mode and multiVersionFixedColumn mode.

  • normal mode: Treats an HBase table as a standard two-dimensional table and retrieves the latest version of data.

    hbase(main):017:0> scan 'users'
    ROW                                   COLUMN+CELL
    lisi                                 column=address:city, timestamp=1457101972764, value=beijing
    lisi                                 column=address:contry, timestamp=1457102773908, value=china
    lisi                                 column=address:province, timestamp=1457101972736, value=beijing
    lisi                                 column=info:age, timestamp=1457101972548, value=27
    lisi                                 column=info:birthday, timestamp=1457101972604, value=1987-06-17
    lisi                                 column=info:company, timestamp=1457101972653, value=baidu
    xiaoming                             column=address:city, timestamp=1457082196082, value=hangzhou
    xiaoming                             column=address:contry, timestamp=1457082195729, value=china
    xiaoming                             column=address:province, timestamp=1457082195773, value=zhejiang
    xiaoming                             column=info:age, timestamp=1457082218735, value=29
    xiaoming                             column=info:birthday, timestamp=1457082186830, value=1987-06-17
    xiaoming                             column=info:company, timestamp=1457082189826, value=alibaba
    2 row(s) in 0.0580 seconds

    The following table displays the output.

    rowKey

    address:city

    address:contry

    address:province

    info:age

    info:birthday

    info:company

    lisi

    beijing

    china

    beijing

    27

    1987-06-17

    baidu

    xiaoming

    hangzhou

    china

    zhejiang

    29

    1987-06-17

    alibaba

  • multiVersionFixedColumn mode: Treats the HBase table as a vertical table. Each record consists of four columns: rowKey, family:qualifier, timestamp, and value. You must specify the columns to read. This mode treats each cell value as a separate record. If a cell has multiple versions, the mode generates a separate record for each version.

    hbase(main):018:0> scan 'users',{VERSIONS=>5}
    ROW                                   COLUMN+CELL
    lisi                                 column=address:city, timestamp=1457101972764, value=beijing
    lisi                                 column=address:contry, timestamp=1457102773908, value=china
    lisi                                 column=address:province, timestamp=1457101972736, value=beijing
    lisi                                 column=info:age, timestamp=1457101972548, value=27
    lisi                                 column=info:birthday, timestamp=1457101972604, value=1987-06-17
    lisi                                 column=info:company, timestamp=1457101972653, value=baidu
    xiaoming                             column=address:city, timestamp=1457082196082, value=hangzhou
    xiaoming                             column=address:contry, timestamp=1457082195729, value=china
    xiaoming                             column=address:province, timestamp=1457082195773, value=zhejiang
    xiaoming                             column=info:age, timestamp=1457082218735, value=29
    xiaoming                             column=info:age, timestamp=1457082178630, value=24
    xiaoming                             column=info:birthday, timestamp=1457082186830, value=1987-06-17
    xiaoming                             column=info:company, timestamp=1457082189826, value=alibaba
    2 row(s) in 0.0260 seconds

    The following table displays the output.

    rowKey

    column:qualifier

    timestamp

    Value

    lisi

    address:city

    1457101972764

    beijing

    lisi

    address:contry

    1457102773908

    china

    lisi

    address:province

    1457101972736

    beijing

    lisi

    info:age

    1457101972548

    27

    lisi

    info:birthday

    1457101972604

    1987-06-17

    lisi

    info:company

    1457101972653

    baidu

    xiaoming

    address:city

    1457082196082

    hangzhou

    xiaoming

    address:contry

    1457082195729

    china

    xiaoming

    address:province

    1457082195773

    zhejiang

    xiaoming

    info:age

    1457082218735

    29

    xiaoming

    info:age

    1457082178630

    24

    xiaoming

    info:birthday

    1457082186830

    1987-06-17

    xiaoming

    info:company

    1457082189826

    alibaba

HBase Writer

  • The HBase Writer can generate a rowKey by concatenating multiple columns from the source.

  • The HBase Writer can set the version (timestamp) of the data in the following ways:

    • Using the current time.

    • Using a value from a source column.

    • Using a user-specified time.

Supported field types

Batch read

  • The following table lists the data type mappings for HBase Reader.

    Type

    Data Integration column type

    Database data type

    Integer

    long

    short, int, and long

    Floating-point

    double

    float and double

    String

    string

    binary_string and string

    Date and Time

    date

    date

    Byte

    bytes

    bytes

    Boolean

    boolean

    boolean

  • HBase20xsql Reader supports most, but not all, Phoenix data types. Verify that your data types are supported before use.

  • The following table lists the type mappings used by HBase20xsql Reader for Phoenix data types.

    DataX internal type

    Phoenix data type

    long

    INTEGER, TINYINT, SMALLINT, and BIGINT

    double

    FLOAT, DECIMAL, and DOUBLE

    string

    CHAR and VARCHAR

    date

    DATE, TIME, and TIMESTAMP

    bytes

    BINARY and VARBINARY

    boolean

    BOOLEAN

Batch write

The following table lists the data type mappings for HBase Writer.

Note
  • Ensure that the column configuration matches the corresponding column types in the HBase table.

  • Only the data types listed in the following table are supported.

Type

Database data type

Integer

INT, LONG, and SHORT

Floating-point

FLOAT and DOUBLE

Boolean

BOOLEAN

String

STRING

Considerations

If you receive the error message "tried to access method com.google.common.base.Stopwatch" when testing connectivity, add the hbaseVersion property to the Data Source configuration and specify the HBase version.

Add a data source

Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Data Source Management. You can view parameter descriptions in the DataWorks console to understand the meanings of the parameters when you add a data source.

Data synchronization tasks

For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.

Single-table offline synchronization task

  • For instructions, see Configure a task in the codeless UI and Configure a task in the code editor.

    By default, Wizard Mode does not display the Field Mapping section because HBase is a schemaless data source. You must configure the field mapping manually:

    • When HBase is the data source, configure Source Field in the following format: data_type|column_family:column_name.

    • When HBase is the data destination, configure both Destination Field and rowkey. For Destination Field, use the format source_field_index|data_type|column_family:column_name. For rowkey, use the format source_primary_key_index|data_type.

    Note

    Each field must be on a separate line.

  • For a complete list of parameters and script examples in Script Mode, see Appendix: Script demos and parameters.

FAQ

  • Q: What is the recommended concurrency setting? Does increasing it help if the import is slow?

    A: The default JVM Heap Size for the Data Import process is 2 GB. Concurrency (the number of channels) is implemented using multi-threading. However, creating excessive threads does not always improve import speed and can degrade performance due to frequent Garbage Collection (GC). As a best practice, we recommend setting Concurrency (the number of channels) to 5 to 10.

  • Q: What is the recommended batchSize setting?

    A: The default value is 256, but you should calculate the optimal batchSize based on your average Row Size. As a best practice, aim for a total data size of 2 MB to 4 MB per batch. Divide this target size by your average Row Size to determine the appropriate batchSize.

Appendix: Script demos and parameters

Configure a batch synchronization task by using the code editor

If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Configure a task in the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.

HBase Reader script demo

{
    "type":"job",
    "version":"2.0",// The version number.
    "steps":[
        {
            "stepType":"hbase",// The plugin name.
            "parameter":{
                "mode":"normal",// The mode for reading data from HBase. Valid values: normal and multiVersionFixedColumn.
                "scanCacheSize":"256",// The number of rows the client reads from the server per RPC.
                "scanBatchSize":"100",// The number of columns the client reads from the server per RPC.
                "hbaseVersion":"094x/11x",// The HBase version.
                "column":[// The columns to read.
                    {
                        "name":"rowkey",// The column name.
                        "type":"string"// The data type.
                    },
                    {
                        "name":"columnFamilyName1:columnName1",
                        "type":"string"
                    },
                    {
                        "name":"columnFamilyName2:columnName2",
                        "format":"yyyy-MM-dd",
                        "type":"date"
                    },
                    {
                        "name":"columnFamilyName3:columnName3",
                        "type":"long"
                    }
                ],
                "range":{// The rowkey range for the HBase Reader.
                    "endRowkey":"",// The end rowkey.
                    "isBinaryRowkey":true,// Specifies how to convert startRowkey and endRowkey to byte arrays. The default value is false.
                    "startRowkey":""// The start rowkey.
                },
                "maxVersion":"",// The number of versions to read in multi-version mode.
                "encoding":"UTF-8",// The encoding format.
                "table":"",// The table name.
                "hbaseConfig":{// Connection configuration for the HBase cluster, in JSON format.
                    "hbase.zookeeper.quorum":"hostname",
                    "hbase.rootdir":"hdfs://ip:port/database",
                    "hbase.cluster.distributed":"true"
                }
            },
            "name":"Reader",
            "category":"reader"
        },
        {
            "stepType":"stream",
            "parameter":{},
            "name":"Writer",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":"0"// The maximum number of error records allowed.
        },
        "speed":{
            "throttle":true,// Enables throttling. If true, throttling is enabled based on the mbps value. If false, the mbps parameter is ignored.
            "concurrent":1,// The number of concurrent tasks for the job.
            "mbps":"12"// The throttling rate. In this example, 1 mbps is equal to 1 MB/s.
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
}

HBase Reader parameters

Parameter

Description

Required

Default

haveKerberos

Specifies whether the HBase cluster requires Kerberos authentication. If set to true, Kerberos authentication is enabled.

Note
  • If this parameter is set to true, you must also configure the following Kerberos authentication parameters:

    • kerberosKeytabFilePath

    • kerberosPrincipal

    • hbaseMasterKerberosPrincipal

    • hbaseRegionserverKerberosPrincipal

    • hbaseRpcProtection

  • If your HBase cluster does not use Kerberos authentication, you do not need to configure these parameters.

No

false

hbaseConfig

The connection configuration for the HBase cluster, in JSON format. The hbase.zookeeper.quorum property, which specifies the ZooKeeper (ZK) address of the HBase cluster, is required. You can add other HBase client configurations, such as scan cache and batch settings, to optimize server interaction.

Note

If you are connecting to an ApsaraDB for HBase database, use its private address.

Yes

None

mode

The mode for reading data from HBase. Valid values: normal and multiVersionFixedColumn.

Yes

None

table

The name of the HBase table to read from. This parameter is case-sensitive.

Yes

None

encoding

The encoding format used to convert the binary HBase byte[] array to a string. Valid values: UTF-8 and GBK.

No

UTF-8

column

The columns to read from HBase. This parameter is required in both normal and multiVersionFixedColumn modes.

  • In normal mode:

    The name property specifies the HBase column to read. Except for rowkey, the format must be <Column Family>:<Column name>. The type property specifies the source data type. The format property specifies the pattern for date types. The value property defines a constant column. The plugin can also generate a constant value for a column instead of reading it from HBase. The configuration is as follows:

    "column": 
    [
    {
      "name": "rowkey",
      "type": "string"
    },
    {
      "value": "test",
      "type": "string"
    }
    ]

    In normal mode, you must specify the type property for each column. You must also include either the name or the value property.

  • In multiVersionFixedColumn mode:

    The name property specifies the HBase column to read. Except for rowkey, the format must be <Column Family>:<Column name>. The type property specifies the source data type, and the format property specifies the pattern for date types. Constant columns are not supported in multiVersionFixedColumn mode. The configuration is as follows:

    "column": 
    [
    {
      "name": "rowkey",
      "type": "string"
    },
    {
      "name": "info:age",
      "type": "string"
    }
    ]

Yes

None

maxVersion

The number of versions the HBase Reader reads in multi-version mode. Set to -1 to read all versions, or to an integer greater than 1.

Required in multiVersionFixedColumn mode

None

range

Specifies the rowkey range for the HBase Reader.

  • startRowkey: The start of the rowkey range.

  • endRowkey: The end of the rowkey range.

  • isBinaryRowkey: Specifies how to convert startRowkey and endRowkey to byte arrays. The default value is false. If set to true, the plugin uses the Bytes.toBytesBinary(rowkey) method. If set to false, it uses the Bytes.toBytes(rowkey) method. The configuration is as follows:

    "range": {
    "startRowkey": "aaa",
    "endRowkey": "ccc",
    "isBinaryRowkey":false
    }

No

None

scanCacheSize

The number of rows the HBase Reader fetches from the server per RPC.

No

256

scanBatchSize

The number of columns the HBase Reader fetches from the server per RPC. A value of -1 indicates that all columns are returned.

Note

To avoid potential data quality issues, set scanBatchSize to a value greater than the actual number of columns.

No

100

HBase Writer script demo

{
    "type":"job",
    "version":"2.0",// The version number.
    "steps":[
        {
            "stepType":"stream",
            "parameter":{},
            "name":"Reader",
            "category":"reader"
        },
        {
            "stepType":"hbase",// The plugin name.
            "parameter":{
                "mode":"normal",// The mode for writing data to HBase.
                "walFlag":"false",// Specifies whether to write to the Write-Ahead Log (WAL). A value of false disables it.
                "hbaseVersion":"094x",// The HBase version.
                "rowkeyColumn":[// The columns to use for the rowkey.
                    {
                        "index":"0",// The serial number.
                        "type":"string"// The data type.
                    },
                    {
                        "index":"-1",
                        "type":"string",
                        "value":"_"
                    }
                ],
                "nullMode":"skip",// Specifies how to handle null values.
                "column":[// The HBase columns to write to.
                    {
                        "name":"columnFamilyName1:columnName1",// The column name.
                        "index":"0",// The index number.
                        "type":"string"// The data type.
                    },
                    {
                        "name":"columnFamilyName2:columnName2",
                        "index":"1",
                        "type":"string"
                    },
                    {
                        "name":"columnFamilyName3:columnName3",
                        "index":"2",
                        "type":"string"
                    }
                ],
                "encoding":"utf-8",// The encoding format.
                "table":"",// The table name.
                "hbaseConfig":{// Connection configuration for the HBase cluster, in JSON format.
                    "hbase.zookeeper.quorum":"hostname",
                    "hbase.rootdir":"hdfs://ip:port/database",
                    "hbase.cluster.distributed":"true"
                }
            },
            "name":"Writer",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":"0"// The maximum number of error records allowed.
        },
        "speed":{
            "throttle":true,// Enables throttling. If true, throttling is enabled based on the mbps value. If false, the mbps parameter is ignored.
            "concurrent":1, // The number of concurrent tasks for the job.
            "mbps":"12"// The throttling rate.
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
}

HBase Writer parameters

Parameter

Description

Required

Default

haveKerberos

Specifies whether the HBase cluster requires Kerberos authentication. If set to true, Kerberos authentication is enabled.

Note
  • If this parameter is set to true, you must also configure the following Kerberos authentication parameters:

    • kerberosKeytabFilePath

    • kerberosPrincipal

    • hbaseMasterKerberosPrincipal

    • hbaseRegionserverKerberosPrincipal

    • hbaseRpcProtection

  • If your HBase cluster does not use Kerberos authentication, you do not need to configure these parameters.

No

false

hbaseConfig

The connection configuration for the HBase cluster, in JSON format. The hbase.zookeeper.quorum property, which specifies the ZooKeeper (ZK) address of the HBase cluster, is required. You can add other HBase client configurations, such as scan cache and batch settings, to optimize server interaction.

Note

If you are connecting to an ApsaraDB for HBase database, use its private address.

Yes

None

mode

The mode for writing data to HBase. Currently, only normal mode is supported.

Yes

None

table

The name of the target HBase table. This parameter is case-sensitive.

Yes

None

encoding

The encoding format used to convert a STRING to an HBase byte[] array. Valid values: UTF-8 and GBK.

No

UTF-8

column

The HBase columns to write to:

  • index: The index of the corresponding column from the Reader, starting from 0.

  • name: The name of the column in the HBase table. The format must be <Column Family>:<Column name>.

  • type: The data type to write, used for converting the source data to an HBase byte array.

Yes

None

rowkeyColumn

Specifies the columns used to construct the rowkey for writing data to HBase:

  • index: The index of the corresponding column from the Reader, starting from 0. For a constant, set the index to -1.

  • type: The data type to write, used for converting the source data to an HBase byte array.

  • value: Defines a constant, often used as a separator when concatenating multiple columns. The HBase Writer concatenates all columns in the rowkeyColumn array in the specified order to create the final rowkey. Not all columns in the array can be constants.

The configuration is as follows:

"rowkeyColumn": [
          {
            "index":0,
            "type":"string"
          },
          {
            "index":-1,
            "type":"string",
            "value":"_"
          }
      ]

Yes

None

versionColumn

Specifies the Timestamp for the data written to HBase. You can use the current system time, a value from a source column, or a fixed value. If this parameter is not configured, the current time is used by default.

  • index: The index of the corresponding time column from the Reader, starting from 0. The value must be convertible to a LONG type.

  • type: If the source type is Date, the system attempts to parse it using the yyyy-MM-dd HH:mm:ss and yyyy-MM-dd HH:mm:ss SSS formats. For a fixed time, set the index to -1.

  • value: A fixed time value of the LONG type.

The configuration is as follows:

  • "versionColumn":{
    "index":1
    }
  • "versionColumn":{
    "index":-1,
    "value":123456789
    }

No

None

nullMode

Specifies how to handle null values from the source data:

  • skip: Does not write the column to HBase.

  • empty: Writes an empty byte array (HConstants.EMPTY_BYTE_ARRAY, which is new byte [0]).

No

skip

walFlag

When a client sends a Put or Delete operation, the data is first written to the Write-Ahead Log (WAL) before being stored in the MemStore. This process ensures data durability.

Setting this parameter to false disables writing to the WAL, which can improve write performance.

No

false

writeBufferSize

The size of the HBase client's write buffer in bytes. This buffer works with the client's autoflush setting, which is disabled by default.

autoflush (disabled by default):

  • true: The HBase client sends an update for every single put operation.

  • false: The HBase client sends a write request to the HBase server only when the client-side write buffer is full.

No

8 MB

fileSystemUsername

If a data synchronization task fails due to Ranger permission issues, switch to Script Mode and set this parameter to a username with the required HBase access permissions. DataWorks then uses this username for the connection.

No

None

HBase20xsql Reader script demo

{
    "type":"job",
    "version":"2.0",// The version number.
    "steps":[
        {
            "stepType":"hbase20xsql",// The plugin name.
            "parameter":{
                "queryServerAddress": "http://127.0.0.1:8765",  // The Phoenix QueryServer address.
                "serialization": "PROTOBUF",  // The QueryServer serialization format.
                "table": "TEST",    // The table to read.
                "column": ["ID", "NAME"],   // The columns to read.
                "splitKey": "ID"    // The split column, which must be the primary key of the table.
            },
            "name":"Reader",
            "category":"reader"
        },
        {
            "stepType":"stream",
            "parameter":{},
            "name":"Writer",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":"0"// The maximum number of error records allowed.
        },
        "speed":{
            "throttle":true,// Enables throttling. If true, throttling is enabled based on the mbps value. If false, the mbps parameter is ignored.
            "concurrent":1,// The number of concurrent tasks for the job.
            "mbps":"12"// The throttling rate. In this example, 1 mbps is equal to 1 MB/s.
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
}

HBase20xsql Reader parameters

Parameter

Description

Required

Default

queryServerAddress

The HBase20xsql Reader uses a Phoenix lightweight client to connect to the Phoenix QueryServer. Specify the QueryServer address here. For ApsaraDB for HBase enhanced edition (Lindorm) users, you can pass user and password parameters as optional attributes in the queryServerAddress string. Format: http://127.0.0.1:8765;user=root;password=root.

Yes

None

serialization

The serialization protocol used by the QueryServer.

No

PROTOBUF

table

The name of the table to read. This parameter is case-sensitive.

Yes

None

schema

The schema that contains the table.

No

None

column

A JSON array that contains the names of the columns to synchronize. If left empty, all columns are read.

No

All columns

splitKey

Specifies a column to use for data sharding. Providing a splitKey enables parallel data synchronization, which improves performance. Two splitting methods are available. If splitPoint is empty, the data is automatically split based on Method 1:

  • Method 1: The plugin finds the minimum and maximum values of the splitKey column and creates even splits based on the specified concurrent value.

    Note

    Only integer and string data types are supported for the split key.

  • Method 2: The data is split according to the specified splitPoint. The synchronization is then performed based on the configured concurrent value.

Yes

None

splitPoints

Splitting a column based on its minimum and maximum values can create data hotspots. Therefore, we recommend setting split points based on the startkey and endkey of the regions. This approach aligns each query with a single region, preventing hotspots.

No

None

where

The filter condition. You can add a filter to the table query. The HBase20xsql Reader constructs an SQL query based on the specified column, table, and where conditions, and then extracts data based on that query.

No

None

querySql

In some use cases, the where parameter may not be sufficient to describe the desired filter conditions. You can use this parameter to define a custom filter SQL query. When querySql is configured, the queryserverAddress parameter is still required, but the HBase20xsql Reader ignores the column, table, where, and splitKey parameters. Instead, it uses this query to fetch the data.

No

None

HBase11xsql Writer script demo

{
  "type": "job",
  "version": "1.0",
  "configuration": {
    "setting": {
      "errorLimit": {
        "record": "0"
      },
      "speed": {
            "throttle":true,// Enables throttling. If true, throttling is enabled based on the mbps value. If false, the mbps parameter is ignored.
            "concurrent":1, // The number of concurrent tasks for the job.
            "mbps":"1"// The throttling rate. In this example, 1 mbps is equal to 1 MB/s.
      }
    },
    "reader": {
      "plugin": "odps",
      "parameter": {
        "datasource": "",
        "table": "",
        "column": [],
        "partition": ""
      }
    },
    "plugin": "hbase11xsql",
    "parameter": {
      "table": "The target HBase table name, which is case-sensitive.",
      "hbaseConfig": {
        "hbase.zookeeper.quorum": "The ZooKeeper server address of the target HBase cluster.",
        "zookeeper.znode.parent": "The znode of the target HBase cluster."
      },
      "column": [
        "columnName"
      ],
      "batchSize": 256,
      "nullMode": "skip"
    }
  }
}

HBase11xsql Writer parameters

Parameter

Description

Required

Default

plugin

The name of the plugin. Must be hbase11xsql.

Yes

None

table

The name of the target Phoenix table for the data import. This parameter is case-sensitive. Phoenix table names are typically in uppercase.

Yes

None

column

The column names. This parameter is case-sensitive. Phoenix column names are typically in uppercase.

Note
  • The column order must exactly match the output column order from the Reader.

  • You do not need to specify data types. The plugin automatically retrieves the column metadata from Phoenix.

Yes

None

hbaseConfig

The address of the HBase cluster. The ZooKeeper quorum is required. Format: ip1,ip2,ip3.

Note
  • Use commas to separate multiple IP addresses.

  • The znode is optional. The default value is /hbase.

Yes

None

batchSize

The maximum number of rows in a batch write operation.

No

256

nullMode

Specifies how to handle null values from the source data:

  • skip: Does not write the column. If a value for this column already exists in the target table, it is deleted.

  • empty: Inserts a null value. For numeric types, this is 0; for varchar types, this is an empty string.

No

skip