Use SQL statements to access an HBase table - Lindorm - Alibaba Cloud Documentation Center

This topic describes how to use SQL statements to access an HBase table.

Prerequisites

The version of LindormTable is 2.6.4 or later. For more information about how to view or upgrade the version of LindormTable, see Release notes of LindormTable and Upgrade the minor engine version of a Lindorm instance.

Background information

LindormTable allows you to directly access the tables that are created by using HBase Shell or ApsaraDB for HBase API for Java. An HBase table is a schema-free table. Columns in an HBase table are dynamic columns of the VARBINARY data type, which represents a byte array. For more information about dynamic columns, see Dynamic columns. ApsaraDB for HBase provides the column mapping feature and supports HBase-compatible data types for HBase tables. This way, you can use Lindorm SQL statements for columns that are written based on ApsaraDB for HBase API for Java and use a wide range of data types and secondary indexes.

Syntax

In Lindorm SQL, you can add mappings for the qualifiers in a custom column family in an HBase table. This way, you can use the SQL statements for subsequent queries.

You can use the following syntax to add and remove mappings:

dynamic_column_mapping_statement   := ALTER TABLE table_name MAP DYNAMIC COLUMN
                                      qualifer_definition hbase_type;
dynamic_column_unmapping_statement := ALTER TABLE table_name UNMAP DYNAMIC COLUMN
                                      qualifer_definition_list;
qualifer_definition_list           := qualifer_definition
                                      (',' qualifer_definition)*
qualifer_definition                := [ family_name ':' ] qualifier_name
hbase_type                         := HLONG | HINTEGER | HSHORT | HFLOAT |
                                      HDOUBLE | HSTRING | HBOOLEAN

The following table describes the mapping data types that can be specified by the hbase_type parameter.

Data type	Java data type	Description
HLONG	java.lang.Long	Data is written to the columns of the HBase table by using the Bytes.toBytes(long) function.
HINTEGER	java.lang.Integer	Data is written to the columns of the HBase table by using the Bytes.toBytes(int) function.
HSHORT	java.lang.Short	Data is written to the columns of the HBase table by using the Bytes.toBytes(short) function.
HFLOAT	java.lang.Float	Data is written to the columns of the HBase table by using the Bytes.toBytes(float) function.
HDOUBLE	java.lang.Double	Data is written to the columns of the HBase table by using the Bytes.toBytes(double) function.
HSTRING	java.lang.String	Data is written to the columns of the HBase table by using the Bytes.toBytes(String) function.
HBOOLEAN	java.lang.Boolean	Data is written to the columns of the HBase table by using the Bytes.toBytes(boolean) function.

Note

LindormTable of version 2.5.1 and later versions support mappings of rowkeys. For the mapping method of the rowkeys, see the mapping method of the qualifiers. When you map the rowkeys, make sure that the rowkeys are enclosed by using backticks (` `).
In other programming languages, you can use the toBytes method in the org.apache.hadoop.hbase.util.Bytes Java class to encode and write data.
In Java, the Bytes.toBytes(String) function uses UTF-8 for data encoding. In other programming languages, UTF-8 is also used for data encoding when you use the toBytes method to convert strings into bytes.

Data preparations

In this topic, ApsaraDB for HBase API for Java is used to create a sample HBase table and write data to the table. For more information, see Use ApsaraDB for HBase API for Java to develop applications.

Note

For more information about how to create an HBase table and write data to the table by using other methods, see Use Lindorm Shell to connect to LindormTable.

// Create a sample HBase table named dt and specify f1 as the column family name.           
try (Admin admin = connection.getAdmin()) {
            Table table = connection.getTable(TableName.valueOf("dt"));
            HTableDescriptor htd = new HTableDescriptor(TableName.valueOf("dt"));
            htd.addFamily(new HColumnDescriptor(Bytes.toBytes("f1")));
            admin.createTable(htd);
            }
    
// Write data to the table.
try (Table table = connection.getTable(TableName.valueOf("dt"))) {
    byte[] rowkey = Bytes.toBytes("row1");
    byte[] family = Bytes.toBytes("f1");
    Put put = new Put(rowkey);
    // Write data of the STRING type and specify name as the column name.
    String name = "Some one";
    put.addColumn(family, Bytes.toBytes("name"), Bytes.toBytes(name));
    // Write data of the INT type and specify age as the column name.
    int age = 25;
    put.addColumn(family, Bytes.toBytes("age"), Bytes.toBytes(age));
    // Write data of the LONG type and specify time as the column name.
    long timestamp = 1656675491000L;
    put.addColumn(family, Bytes.toBytes("time"), Bytes.toBytes(timestamp));
    // Write data of the SHORT type and specify buycode as the column name.
    short buycode = 123;
    put.addColumn(family, Bytes.toBytes("buycode"), Bytes.toBytes(buycode));
    // Write data of the FLOAT type and specify price as the column name.
    float price = 12.3f;
    put.addColumn(family, Bytes.toBytes("price"), Bytes.toBytes(price));
    // Write data of the DOUBLE type and specify price2 as the column name.
    double price2 = 12.33333;
    put.addColumn(family, Bytes.toBytes("price2"), Bytes.toBytes(price2));
    // Write data of the BOOLEAN type and specify isMale as the column name.
    boolean isMale = true;
    put.addColumn(family, Bytes.toBytes("isMale"), Bytes.toBytes(isMale));

    // Write null values for all types of data.
    //put.addColumn(family, qualifier, null);

    table.put(put);
    }

Procedure

The following example shows how to create an HBase table named dt and use SQL statements to access the HBase table.

Use Lindorm-cli to connect to and use LindormTable. For more information about how to connect to and use LindormTable, see Use Lindorm-cli to connect to and use LindormTable.
Note
If you use SQL statements to access an HBase table in ApsaraDB for HBase Performance-enhanced Edition, you must change the format of the API URL for Java to jdbc:lindorm:table:url=http://API URL for Java, and change the port number from 30020 to 30060. The API URL for Java is obtained in the ApsaraDB for HBase console.
For example, if you obtain the ld-bp1ietqp4fby3****-proxy-hbaseue.hbaseue.rds.aliyuncs.com:30020 URL in the ApsaraDB for HBase console, change the URL to jdbc:lindorm:table:url=http://ld-bp1ietqp4fby3****-proxy-hbaseue.hbaseue.rds.aliyuncs.com:30060.

Execute the ALTER TABLE statement to add mappings for the data that is written to the columns of the dt table.

ALTER TABLE dt MAP DYNAMIC COLUMN `ROW` HSTRING, f1:name HSTRING, f1:age HINTEGER, f1:time HLONG, f1:buycode HSHORT, f1:price HFLOAT, f1:price2 HDOUBLE, f1:isMale HBOOLEAN;

Note

Adding column mappings refers to specifying data types for the columns regardless of whether to write data to the columns.
The system converts the bytes back to the original data based on schemas. You must specify a correct data type mapped to a SQL data type in the Lindorm SQL statement.

In the following example, if you specify HINTEGER as the data type for the column named f:age2, the system may retrieve incorrect data by invoking the Bytes.toInt() function.

int age = 25;
byte[] ageValue = Bytes.toBytes(age);
put.addColumn(Bytes.toBytes("f"), Bytes.toBytes("age"), ageValue);//The data type for the column named f:age is INT, and is mapped to the HINTEGER data type in the Lindorm SQL statement. 
String age2 = "25";
byte[] age2Value = Bytes.toBytes(age2);
put.addColumn(Bytes.toBytes("f"), Bytes.toBytes("age2"), age2Value);// The data type of the column named f:age2 is STRING, and is mapped to the HSTRING data type in the Lindorm SQL statement.

Execute the DESCRIBE statement to view the mappings of the current schema.
```
DESCRIBE dt;
```
Note
For more information about the syntax of the DESCRIBE statement, see DESCRIBE/SHOW/USE.

Query data in the dt table by using the following SQL statements:

SELECT * FROM dt LIMIT 1;
SELECT * FROM dt WHERE f1:isMale=true LIMIT 1;
SELECT * FROM dt WHERE f1:name='Some one' LIMIT 1;
SELECT * FROM dt WHERE f1:time>1656675490000 and f1:time<1656675492000 LIMIT 1;

Optional. Create a secondary index.
Secondary indexes are used to reduce data query time at the cost of storage space. Secondary indexes allow you to query data based on non-primary key columns. This improves query efficiency but requires more storage space. For more information about syntax limits on secondary indexes, see CREATE INDEX and Secondary indexes.
1. Modify the attribute of the primary table named dt.
```
ALTER TABLE dt SET 'MUTABILITY' = 'MUTABLE_LATEST';
```
  Note
  If you use a custom timestamp, you must set the attribute of the primary table to MUTABLE_ALL.
2. Create a secondary index for the table.
```
CREATE INDEX idx ON dt(f1:age) WITH (INDEX_COVERED_TYPE ='COVERED_DYNAMIC_COLUMNS');
```
3. Optional. If you specify the async parameter when you create the secondary index and the version of LindormTable is earlier than 2.6.3, build historical data from the primary table to an index table. Then, you can query historical data by using the secondary index. Skip this step if you do not specify the async parameter when you create the secondary index. Then, you can query historical data by using the secondary index. Skip this step if you do not specify the async parameter when you create the secondary index.
```
BUILD INDEX idx ON dt;
```
4. View the index.
```
SHOW INDEX FROM dt;
```
  Sample result:
```
+---------------+----------- -+-------------+--------------+------------------+---------------+-----------------+----------------+-------------+
| TABLE_SCHEMA  | DATA_TABLE  | INDEX_NAME  | INDEX_STATE  |  INDEX_PROGRESS  |  INDEX_TYPE   |  INDEX_COVERED  |  INDEX_COLUMN  |  INDEX_TTL  |
+---------------+-------------+-------------+--------------+------------------+---------------+-----------------+----------------+-------------+
| default       | dt          | idx         | ACTIVE       | 100%             | SECONDARY     |  TRUE           |  f1:age,ROW    |             |
+---------------+-------------+-------------+--------------+------------------+---------------+-----------------+----------------+-------------+
```
  Note
  If the value of INDEX_STATE in the returned result is Active, the index is built.
  The value of INDEX_PROGRESS in the returned result specifies the progress of index building.
5. Optional. Execute the EXPLAIN statement to view the execution plan. You can check whether the secondary index is hit.
```
EXPLAIN SELECT * FROM dt WHERE f1:age=23 LIMIT 1;
```

Optional. Create a search index.

Create a search index.
```
CREATE INDEX search_idx USING SEARCH ON dt(f1:age,f1:name);
```
Note
If you create a search index for an HBase table by executing a SQL statement, take note of the following limits on the columns in the search index:
- All columns in the search index must be defined in the column mappings created for the table.
- The data types supported by the columns in the search index must be the same as those that can be specified in the mappings. For more information, see Data types in mappings.
- The column mappings configured between the HBase table and search index cannot be removed. Otherwise, incorrect query results are returned.
- If you use a custom timestamp to write data to an HBase table and need to create a search index, you must set the MUTABILITY attribute of the table to MUTABLE_ALL.

Check whether an index is created.

SHOW INDEX FROM dt;

Sample result:

+--------------+------------+------------+-------------+----------------+------------+---------------+----------------+-----------+-------------------+
| TABLE_SCHEMA | DATA_TABLE | INDEX_NAME | INDEX_STATE | INDEX_PROGRESS | INDEX_TYPE | INDEX_COVERED |  INDEX_COLUMN  | INDEX_TTL | INDEX_DESCRIPTION |
+--------------+------------+------------+-------------+----------------+------------+---------------+----------------+-----------+-------------------+
| default      | dt         | idx        | ACTIVE      | DONE           | SECONDARY  | DYNAMIC       | f1:age,ROW     |           |                   |
| default      | dt         | search_idx | BUILDING    | N/A            | SEARCH     | NA            | f1:age,f1:name | 0         |                   |
+--------------+------------+------------+-------------+----------------+------------+---------------+----------------+-----------+-------------------+

Optional. Remove one or more column mappings.
- Remove one column mapping. Sample code:
```
ALTER TABLE dt UNMAP DYNAMIC COLUMN f1:isMale;
```
- Remove multiple column mappings. Sample code:
```
ALTER TABLE dt UNMAP DYNAMIC COLUMN f1:price2, f1:price2;
```