DataWorks support for HBase data synchronization - DataWorks

The HBase data source supports both reading from and writing to HBase. This topic describes the synchronization capabilities of the DataWorks HBase data source.

Supported versions

The HBase plugin comes in two types: the standard HBase plugin and the HBase{xx}xsql plugin. The HBase{xx}xsql plugin requires both HBase and Phoenix.

HBase plugin:
This plugin supports HBase0.94.x, HBase1.1.x, and HBase2.x. It supports both wizard mode and script mode. Use the hbaseVersion parameter to specify the version.
- If you are using HBase0.94.x, set hbaseVersion to 094x for both the Reader and Writer plugins.
```
"reader": {
        "hbaseVersion": "094x"
    }
```
```
"writer": {
        "hbaseVersion": "094x"
    }
```
- If you are using HBase1.1.x or HBase2.x, set hbaseVersion to 11x for both the Reader and Writer plugins.
```
"reader": {
        "hbaseVersion": "11x"
    }
```
```
"writer": {
        "hbaseVersion": "11x"
    }
```
  The HBase 1.1.x plugin is compatible with HBase 2.0.
HBase{xx}xsql plugin:
1. HBase20xsql plugin: Supports HBase2.x and Phoenix5.x. Supports script mode only.
  HBase11xsql plugin: Supports HBase1.1.x and Phoenix5.x. Supports script mode only.
2. The HBase{xx}xsql Writer plugin lets you import data in bulk into an SQL table (Phoenix) in HBase. Phoenix applies data encoding to rowkeys. Writing data directly with the HBase API requires manual data conversion, a complex and error-prone process. The HBase{xx}xsql Writer plugin simplifies this process, offering a straightforward way to import data into an SQL table.
  Note
  The plugin uses the Phoenix JDBC driver to execute UPSERT statements and write data to the table in batches. Because it operates through this high-level interface, it also synchronously updates the corresponding index table.

Limitations

HBase reader	HBase20xsql reader	HBase11xsql writer
The HBase Reader does not support reading data written by Phoenix, which uses a special data format. You can use the HBase Reader only with serverless resource groups for Data Integration (recommended) and exclusive resource groups for Data Integration. It does not support Default resource groups or custom resource groups.	You can split a table on one column only: its primary key. When you split a table evenly based on job concurrency, the split column must be an integer or a string. Table names, schema names, and column names are case-sensitive; their casing must match that of the Phoenix table. Because this plugin reads data exclusively through Phoenix QueryServer, you must enable the QueryServer service in Phoenix.	You can use this writer only with serverless resource groups for Data Integration (recommended) and exclusive resource groups for Data Integration. The writer does not support importing data with timestamps. The writer supports only tables created with Phoenix. It does not support native HBase tables. The order of columns in the writer must match the order in the reader. The reader's column order dictates the sequence of columns in the output, and the writer's column order dictates the expected sequence of columns in the input. For example: The reader's column order is c1, c2, c3, c4. The writer's column order is x1, x2, x3, x4. In this scenario, data from the reader's c1 column is written to the writer's x1 column. If the writer's column order is x1, x2, x4, x3, then data from the reader's c3 column is written to x4, and data from c4 is written to x3. The writer supports importing data into indexed tables and automatically updates all related indexes.

Supported features

HBase Reader

HBase Reader supports normal mode and multiVersionFixedColumn mode.

normal mode: Treats an HBase table as a standard two-dimensional table and retrieves the latest version of data.

hbase(main):017:0> scan 'users'
ROW                                   COLUMN+CELL
lisi                                 column=address:city, timestamp=1457101972764, value=beijing
lisi                                 column=address:contry, timestamp=1457102773908, value=china
lisi                                 column=address:province, timestamp=1457101972736, value=beijing
lisi                                 column=info:age, timestamp=1457101972548, value=27
lisi                                 column=info:birthday, timestamp=1457101972604, value=1987-06-17
lisi                                 column=info:company, timestamp=1457101972653, value=baidu
xiaoming                             column=address:city, timestamp=1457082196082, value=hangzhou
xiaoming                             column=address:contry, timestamp=1457082195729, value=china
xiaoming                             column=address:province, timestamp=1457082195773, value=zhejiang
xiaoming                             column=info:age, timestamp=1457082218735, value=29
xiaoming                             column=info:birthday, timestamp=1457082186830, value=1987-06-17
xiaoming                             column=info:company, timestamp=1457082189826, value=alibaba
2 row(s) in 0.0580 seconds

The following table displays the output.

rowKey	address:city	address:contry	address:province	info:age	info:birthday	info:company
lisi	beijing	china	beijing	27	1987-06-17	baidu
xiaoming	hangzhou	china	zhejiang	29	1987-06-17	alibaba

multiVersionFixedColumn mode: Treats the HBase table as a vertical table. Each record consists of four columns: rowKey, family:qualifier, timestamp, and value. You must specify the columns to read. This mode treats each cell value as a separate record. If a cell has multiple versions, the mode generates a separate record for each version.

hbase(main):018:0> scan 'users',{VERSIONS=>5}
ROW                                   COLUMN+CELL
lisi                                 column=address:city, timestamp=1457101972764, value=beijing
lisi                                 column=address:contry, timestamp=1457102773908, value=china
lisi                                 column=address:province, timestamp=1457101972736, value=beijing
lisi                                 column=info:age, timestamp=1457101972548, value=27
lisi                                 column=info:birthday, timestamp=1457101972604, value=1987-06-17
lisi                                 column=info:company, timestamp=1457101972653, value=baidu
xiaoming                             column=address:city, timestamp=1457082196082, value=hangzhou
xiaoming                             column=address:contry, timestamp=1457082195729, value=china
xiaoming                             column=address:province, timestamp=1457082195773, value=zhejiang
xiaoming                             column=info:age, timestamp=1457082218735, value=29
xiaoming                             column=info:age, timestamp=1457082178630, value=24
xiaoming                             column=info:birthday, timestamp=1457082186830, value=1987-06-17
xiaoming                             column=info:company, timestamp=1457082189826, value=alibaba
2 row(s) in 0.0260 seconds

The following table displays the output.

rowKey	column:qualifier	timestamp	Value
lisi	address:city	1457101972764	beijing
lisi	address:contry	1457102773908	china
lisi	address:province	1457101972736	beijing
lisi	info:age	1457101972548	27
lisi	info:birthday	1457101972604	1987-06-17
lisi	info:company	1457101972653	baidu
xiaoming	address:city	1457082196082	hangzhou
xiaoming	address:contry	1457082195729	china
xiaoming	address:province	1457082195773	zhejiang
xiaoming	info:age	1457082218735	29
xiaoming	info:age	1457082178630	24
xiaoming	info:birthday	1457082186830	1987-06-17
xiaoming	info:company	1457082189826	alibaba

HBase Writer

The HBase Writer can generate a rowKey by concatenating multiple columns from the source.
The HBase Writer can set the version (timestamp) of the data in the following ways:
- Using the current time.
- Using a value from a source column.
- Using a user-specified time.

Supported field types

Batch read

The following table lists the data type mappings for HBase Reader.

Type	Data Integration column type	Database data type
Integer	long	short, int, and long
Floating-point	double	float and double
String	string	binary_string and string
Date and Time	date	date
Byte	bytes	bytes
Boolean	boolean	boolean

HBase20xsql Reader supports most, but not all, Phoenix data types. Verify that your data types are supported before use.
The following table lists the type mappings used by HBase20xsql Reader for Phoenix data types.
DataX internal type
Phoenix data type
long
INTEGER, TINYINT, SMALLINT, and BIGINT
double
FLOAT, DECIMAL, and DOUBLE
string
CHAR and VARCHAR
date
DATE, TIME, and TIMESTAMP
bytes
BINARY and VARBINARY
boolean
BOOLEAN

Batch write

The following table lists the data type mappings for HBase Writer.

Note

Ensure that the column configuration matches the corresponding column types in the HBase table.
Only the data types listed in the following table are supported.

Type	Database data type
Integer	INT, LONG, and SHORT
Floating-point	FLOAT and DOUBLE
Boolean	BOOLEAN
String	STRING

Considerations

If you receive the error message "tried to access method com.google.common.base.Stopwatch" when testing connectivity, add the hbaseVersion property to the Data Source configuration and specify the HBase version.

Add a data source

Before you develop a synchronization task in DataWorks, you must add the required data source to DataWorks by following the instructions in Data Source Management. You can view parameter descriptions in the DataWorks console to understand the meanings of the parameters when you add a data source.

Data synchronization tasks

For information about the entry point for and the procedure of configuring a synchronization task, see the following configuration guides.

Single-table offline synchronization task

For instructions, see Configure a task in the codeless UI and Configure a task in the code editor.
By default, Wizard Mode does not display the Field Mapping section because HBase is a schemaless data source. You must configure the field mapping manually:
- When HBase is the data source, configure Source Field in the following format: data_type|column_family:column_name.
- When HBase is the data destination, configure both Destination Field and rowkey. For Destination Field, use the format source_field_index|data_type|column_family:column_name. For rowkey, use the format source_primary_key_index|data_type.
Note
Each field must be on a separate line.
For a complete list of parameters and script examples in Script Mode, see Appendix: Script demos and parameters.

FAQ

Q: What is the recommended concurrency setting? Does increasing it help if the import is slow?
A: The default JVM Heap Size for the Data Import process is 2 GB. Concurrency (the number of channels) is implemented using multi-threading. However, creating excessive threads does not always improve import speed and can degrade performance due to frequent Garbage Collection (GC). As a best practice, we recommend setting Concurrency (the number of channels) to 5 to 10.
Q: What is the recommended batchSize setting?
A: The default value is 256, but you should calculate the optimal batchSize based on your average Row Size. As a best practice, aim for a total data size of 2 MB to 4 MB per batch. Divide this target size by your average Row Size to determine the appropriate batchSize.

Appendix: Script demos and parameters

Configure a batch synchronization task by using the code editor

If you want to configure a batch synchronization task by using the code editor, you must configure the related parameters in the script based on the unified script format requirements. For more information, see Configure a task in the code editor. The following information describes the parameters that you must configure for data sources when you configure a batch synchronization task by using the code editor.

HBase Reader script demo

{
    "type":"job",
    "version":"2.0",// The version number.
    "steps":[
        {
            "stepType":"hbase",// The plugin name.
            "parameter":{
                "mode":"normal",// The mode for reading data from HBase. Valid values: normal and multiVersionFixedColumn.
                "scanCacheSize":"256",// The number of rows the client reads from the server per RPC.
                "scanBatchSize":"100",// The number of columns the client reads from the server per RPC.
                "hbaseVersion":"094x/11x",// The HBase version.
                "column":[// The columns to read.
                    {
                        "name":"rowkey",// The column name.
                        "type":"string"// The data type.
                    },
                    {
                        "name":"columnFamilyName1:columnName1",
                        "type":"string"
                    },
                    {
                        "name":"columnFamilyName2:columnName2",
                        "format":"yyyy-MM-dd",
                        "type":"date"
                    },
                    {
                        "name":"columnFamilyName3:columnName3",
                        "type":"long"
                    }
                ],
                "range":{// The rowkey range for the HBase Reader.
                    "endRowkey":"",// The end rowkey.
                    "isBinaryRowkey":true,// Specifies how to convert startRowkey and endRowkey to byte arrays. The default value is false.
                    "startRowkey":""// The start rowkey.
                },
                "maxVersion":"",// The number of versions to read in multi-version mode.
                "encoding":"UTF-8",// The encoding format.
                "table":"",// The table name.
                "hbaseConfig":{// Connection configuration for the HBase cluster, in JSON format.
                    "hbase.zookeeper.quorum":"hostname",
                    "hbase.rootdir":"hdfs://ip:port/database",
                    "hbase.cluster.distributed":"true"
                }
            },
            "name":"Reader",
            "category":"reader"
        },
        {
            "stepType":"stream",
            "parameter":{},
            "name":"Writer",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":"0"// The maximum number of error records allowed.
        },
        "speed":{
            "throttle":true,// Enables throttling. If true, throttling is enabled based on the mbps value. If false, the mbps parameter is ignored.
            "concurrent":1,// The number of concurrent tasks for the job.
            "mbps":"12"// The throttling rate. In this example, 1 mbps is equal to 1 MB/s.
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
}

HBase Reader parameters

Parameter	Description	Required	Default
haveKerberos	Specifies whether the HBase cluster requires Kerberos authentication. If set to `true`, Kerberos authentication is enabled. Note If this parameter is set to `true`, you must also configure the following Kerberos authentication parameters: kerberosKeytabFilePath kerberosPrincipal hbaseMasterKerberosPrincipal hbaseRegionserverKerberosPrincipal hbaseRpcProtection If your HBase cluster does not use Kerberos authentication, you do not need to configure these parameters.	No	false
hbaseConfig	The connection configuration for the HBase cluster, in JSON format. The hbase.zookeeper.quorum property, which specifies the ZooKeeper (ZK) address of the HBase cluster, is required. You can add other HBase client configurations, such as scan cache and batch settings, to optimize server interaction. Note If you are connecting to an ApsaraDB for HBase database, use its private address.	Yes	None
mode	The mode for reading data from HBase. Valid values: normal and multiVersionFixedColumn.	Yes	None
table	The name of the HBase table to read from. This parameter is case-sensitive.	Yes	None
encoding	The encoding format used to convert the binary HBase byte[] array to a string. Valid values: UTF-8 and GBK.	No	UTF-8
column	The columns to read from HBase. This parameter is required in both normal and multiVersionFixedColumn modes. In `normal` mode: The `name` property specifies the HBase column to read. Except for `rowkey`, the format must be `<Column Family>:<Column name>`. The `type` property specifies the source data type. The `format` property specifies the pattern for date types. The `value` property defines a constant column. The plugin can also generate a constant value for a column instead of reading it from HBase. The configuration is as follows: `"column": [ { "name": "rowkey", "type": "string" }, { "value": "test", "type": "string" } ]` In normal mode, you must specify the type property for each column. You must also include either the name or the value property. In multiVersionFixedColumn mode: The `name` property specifies the HBase column to read. Except for `rowkey`, the format must be `<Column Family>:<Column name>`. The `type` property specifies the source data type, and the `format` property specifies the pattern for date types. Constant columns are not supported in multiVersionFixedColumn mode. The configuration is as follows: `"column": [ { "name": "rowkey", "type": "string" }, { "name": "info:age", "type": "string" } ]`	Yes	None
maxVersion	The number of versions the HBase Reader reads in multi-version mode. Set to -1 to read all versions, or to an integer greater than 1.	Required in `multiVersionFixedColumn` mode	None
range	Specifies the rowkey range for the HBase Reader. startRowkey: The start of the rowkey range. endRowkey: The end of the rowkey range. isBinaryRowkey: Specifies how to convert `startRowkey` and `endRowkey` to byte arrays. The default value is false. If set to `true`, the plugin uses the `Bytes.toBytesBinary(rowkey)` method. If set to `false`, it uses the `Bytes.toBytes(rowkey)` method. The configuration is as follows: `"range": { "startRowkey": "aaa", "endRowkey": "ccc", "isBinaryRowkey":false }`	No	None
scanCacheSize	The number of rows the HBase Reader fetches from the server per RPC.	No	256
scanBatchSize	The number of columns the HBase Reader fetches from the server per RPC. A value of -1 indicates that all columns are returned. Note To avoid potential data quality issues, set scanBatchSize to a value greater than the actual number of columns.	No	100

HBase Writer script demo

{
    "type":"job",
    "version":"2.0",// The version number.
    "steps":[
        {
            "stepType":"stream",
            "parameter":{},
            "name":"Reader",
            "category":"reader"
        },
        {
            "stepType":"hbase",// The plugin name.
            "parameter":{
                "mode":"normal",// The mode for writing data to HBase.
                "walFlag":"false",// Specifies whether to write to the Write-Ahead Log (WAL). A value of false disables it.
                "hbaseVersion":"094x",// The HBase version.
                "rowkeyColumn":[// The columns to use for the rowkey.
                    {
                        "index":"0",// The serial number.
                        "type":"string"// The data type.
                    },
                    {
                        "index":"-1",
                        "type":"string",
                        "value":"_"
                    }
                ],
                "nullMode":"skip",// Specifies how to handle null values.
                "column":[// The HBase columns to write to.
                    {
                        "name":"columnFamilyName1:columnName1",// The column name.
                        "index":"0",// The index number.
                        "type":"string"// The data type.
                    },
                    {
                        "name":"columnFamilyName2:columnName2",
                        "index":"1",
                        "type":"string"
                    },
                    {
                        "name":"columnFamilyName3:columnName3",
                        "index":"2",
                        "type":"string"
                    }
                ],
                "encoding":"utf-8",// The encoding format.
                "table":"",// The table name.
                "hbaseConfig":{// Connection configuration for the HBase cluster, in JSON format.
                    "hbase.zookeeper.quorum":"hostname",
                    "hbase.rootdir":"hdfs://ip:port/database",
                    "hbase.cluster.distributed":"true"
                }
            },
            "name":"Writer",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":"0"// The maximum number of error records allowed.
        },
        "speed":{
            "throttle":true,// Enables throttling. If true, throttling is enabled based on the mbps value. If false, the mbps parameter is ignored.
            "concurrent":1, // The number of concurrent tasks for the job.
            "mbps":"12"// The throttling rate.
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
}

HBase Writer parameters

Parameter	Description	Required	Default
haveKerberos	Specifies whether the HBase cluster requires Kerberos authentication. If set to `true`, Kerberos authentication is enabled. Note If this parameter is set to `true`, you must also configure the following Kerberos authentication parameters: kerberosKeytabFilePath kerberosPrincipal hbaseMasterKerberosPrincipal hbaseRegionserverKerberosPrincipal hbaseRpcProtection If your HBase cluster does not use Kerberos authentication, you do not need to configure these parameters.	No	false
hbaseConfig	The connection configuration for the HBase cluster, in JSON format. The hbase.zookeeper.quorum property, which specifies the ZooKeeper (ZK) address of the HBase cluster, is required. You can add other HBase client configurations, such as scan cache and batch settings, to optimize server interaction. Note If you are connecting to an ApsaraDB for HBase database, use its private address.	Yes	None
mode	The mode for writing data to HBase. Currently, only normal mode is supported.	Yes	None
table	The name of the target HBase table. This parameter is case-sensitive.	Yes	None
encoding	The encoding format used to convert a STRING to an HBase byte[] array. Valid values: UTF-8 and GBK.	No	UTF-8
column	The HBase columns to write to: index: The index of the corresponding column from the Reader, starting from 0. name: The name of the column in the HBase table. The format must be `<Column Family>:<Column name>`. type: The data type to write, used for converting the source data to an HBase byte array.	Yes	None
rowkeyColumn	Specifies the columns used to construct the rowkey for writing data to HBase: index: The index of the corresponding column from the Reader, starting from 0. For a constant, set the index to -1. type: The data type to write, used for converting the source data to an HBase byte array. value: Defines a constant, often used as a separator when concatenating multiple columns. The HBase Writer concatenates all columns in the `rowkeyColumn` array in the specified order to create the final rowkey. Not all columns in the array can be constants. The configuration is as follows: `"rowkeyColumn": [ { "index":0, "type":"string" }, { "index":-1, "type":"string", "value":"_" } ]`	Yes	None
versionColumn	Specifies the Timestamp for the data written to HBase. You can use the current system time, a value from a source column, or a fixed value. If this parameter is not configured, the current time is used by default. index: The index of the corresponding time column from the Reader, starting from 0. The value must be convertible to a LONG type. type: If the source type is Date, the system attempts to parse it using the yyyy-MM-dd HH:mm:ss and yyyy-MM-dd HH:mm:ss SSS formats. For a fixed time, set the index to -1. `value`: A fixed time value of the LONG type. The configuration is as follows: `"versionColumn":{ "index":1 }` `"versionColumn":{ "index":-1, "value":123456789 }`	No	None
nullMode	Specifies how to handle null values from the source data: skip: Does not write the column to HBase. empty: Writes an empty byte array (HConstants.EMPTY_BYTE_ARRAY, which is `new byte [0]`).	No	skip
walFlag	When a client sends a Put or Delete operation, the data is first written to the Write-Ahead Log (WAL) before being stored in the MemStore. This process ensures data durability. Setting this parameter to `false` disables writing to the WAL, which can improve write performance.	No	false
writeBufferSize	The size of the HBase client's write buffer in bytes. This buffer works with the client's `autoflush` setting, which is disabled by default. `autoflush` (disabled by default): `true`: The HBase client sends an update for every single `put` operation. `false`: The HBase client sends a write request to the HBase server only when the client-side write buffer is full.	No	8 MB
fileSystemUsername	If a data synchronization task fails due to Ranger permission issues, switch to Script Mode and set this parameter to a username with the required HBase access permissions. DataWorks then uses this username for the connection.	No	None

HBase20xsql Reader script demo

{
    "type":"job",
    "version":"2.0",// The version number.
    "steps":[
        {
            "stepType":"hbase20xsql",// The plugin name.
            "parameter":{
                "queryServerAddress": "http://127.0.0.1:8765",  // The Phoenix QueryServer address.
                "serialization": "PROTOBUF",  // The QueryServer serialization format.
                "table": "TEST",    // The table to read.
                "column": ["ID", "NAME"],   // The columns to read.
                "splitKey": "ID"    // The split column, which must be the primary key of the table.
            },
            "name":"Reader",
            "category":"reader"
        },
        {
            "stepType":"stream",
            "parameter":{},
            "name":"Writer",
            "category":"writer"
        }
    ],
    "setting":{
        "errorLimit":{
            "record":"0"// The maximum number of error records allowed.
        },
        "speed":{
            "throttle":true,// Enables throttling. If true, throttling is enabled based on the mbps value. If false, the mbps parameter is ignored.
            "concurrent":1,// The number of concurrent tasks for the job.
            "mbps":"12"// The throttling rate. In this example, 1 mbps is equal to 1 MB/s.
        }
    },
    "order":{
        "hops":[
            {
                "from":"Reader",
                "to":"Writer"
            }
        ]
    }
}

HBase20xsql Reader parameters

Parameter	Description	Required	Default
queryServerAddress	The HBase20xsql Reader uses a Phoenix lightweight client to connect to the Phoenix QueryServer. Specify the QueryServer address here. For ApsaraDB for HBase enhanced edition (Lindorm) users, you can pass `user` and `password` parameters as optional attributes in the queryServerAddress string. Format: `http://127.0.0.1:8765;user=root;password=root`.	Yes	None
serialization	The serialization protocol used by the QueryServer.	No	PROTOBUF
table	The name of the table to read. This parameter is case-sensitive.	Yes	None
schema	The schema that contains the table.	No	None
column	A JSON array that contains the names of the columns to synchronize. If left empty, all columns are read.	No	All columns
splitKey	Specifies a column to use for data sharding. Providing a splitKey enables parallel data synchronization, which improves performance. Two splitting methods are available. If splitPoint is empty, the data is automatically split based on Method 1: Method 1: The plugin finds the minimum and maximum values of the splitKey column and creates even splits based on the specified concurrent value. Note Only integer and string data types are supported for the split key. Method 2: The data is split according to the specified splitPoint. The synchronization is then performed based on the configured concurrent value.	Yes	None
splitPoints	Splitting a column based on its minimum and maximum values can create data hotspots. Therefore, we recommend setting split points based on the startkey and endkey of the regions. This approach aligns each query with a single region, preventing hotspots.	No	None
where	The filter condition. You can add a filter to the table query. The HBase20xsql Reader constructs an SQL query based on the specified column, table, and where conditions, and then extracts data based on that query.	No	None
querySql	In some use cases, the where parameter may not be sufficient to describe the desired filter conditions. You can use this parameter to define a custom filter SQL query. When `querySql` is configured, the queryserverAddress parameter is still required, but the HBase20xsql Reader ignores the column, table, where, and splitKey parameters. Instead, it uses this query to fetch the data.	No	None

HBase11xsql Writer script demo

{
  "type": "job",
  "version": "1.0",
  "configuration": {
    "setting": {
      "errorLimit": {
        "record": "0"
      },
      "speed": {
            "throttle":true,// Enables throttling. If true, throttling is enabled based on the mbps value. If false, the mbps parameter is ignored.
            "concurrent":1, // The number of concurrent tasks for the job.
            "mbps":"1"// The throttling rate. In this example, 1 mbps is equal to 1 MB/s.
      }
    },
    "reader": {
      "plugin": "odps",
      "parameter": {
        "datasource": "",
        "table": "",
        "column": [],
        "partition": ""
      }
    },
    "plugin": "hbase11xsql",
    "parameter": {
      "table": "The target HBase table name, which is case-sensitive.",
      "hbaseConfig": {
        "hbase.zookeeper.quorum": "The ZooKeeper server address of the target HBase cluster.",
        "zookeeper.znode.parent": "The znode of the target HBase cluster."
      },
      "column": [
        "columnName"
      ],
      "batchSize": 256,
      "nullMode": "skip"
    }
  }
}

HBase11xsql Writer parameters

Parameter	Description	Required	Default
plugin	The name of the plugin. Must be `hbase11xsql`.	Yes	None
table	The name of the target Phoenix table for the data import. This parameter is case-sensitive. Phoenix table names are typically in uppercase.	Yes	None
column	The column names. This parameter is case-sensitive. Phoenix column names are typically in uppercase. Note The column order must exactly match the output column order from the Reader. You do not need to specify data types. The plugin automatically retrieves the column metadata from Phoenix.	Yes	None
hbaseConfig	The address of the HBase cluster. The ZooKeeper quorum is required. Format: `ip1,ip2,ip3`. Note Use commas to separate multiple IP addresses. The znode is optional. The default value is `/hbase`.	Yes	None
batchSize	The maximum number of rows in a batch write operation.	No	256
nullMode	Specifies how to handle null values from the source data: `skip`: Does not write the column. If a value for this column already exists in the target table, it is deleted. empty: Inserts a null value. For numeric types, this is 0; for varchar types, this is an empty string.	No	skip

DataX internal type	Phoenix data type
long	INTEGER, TINYINT, SMALLINT, and BIGINT
double	FLOAT, DECIMAL, and DOUBLE
string	CHAR and VARCHAR
date	DATE, TIME, and TIMESTAMP
bytes	BINARY and VARBINARY
boolean	BOOLEAN