×
Community Blog ApsaraDB for HBase Publishes Full-Text Indexing Service to Handle Complex Queries

ApsaraDB for HBase Publishes Full-Text Indexing Service to Handle Complex Queries

ApsaraDB for HBase full-text indexing service enhances query capabilities and automatically synchronizes data, allowing users to focus on enriching their service architecture.

By Tian Si

ApsaraDB for HBase has published the full-text indexing service. For ApsaraDB for HBase instances created after January 25, 2019, the full-text indexing service can be enabled free of charge on the console. Using this service, users can build more feature-rich search services on HBase, without being limited to KV simple queries, worrying about designing various row keys, or fearing the ever-changing HBase complex query services. The "full-text indexing service" is designed for ApsaraDB for HBase to enhance query capabilities and automatically synchronize data, allowing users to focus on how to enrich their service architecture with powerful retrieval functions.

Reasons for Enhancing HBase's Retrieval Ability

We've all experienced the problem of designing the row key when using HBase. However, regardless of how excellent our engineers are, sorting out and listing all the service retrieval requirements and cutting out and compromising various services, they still cannot design an all-round row key to meet all kinds of service query requirements.

For example, in a logistics management system, we need to query based on any combination of the following conditions: recipient name/mobile phone/address, sender name/mobile phone/address, waybill number/start time/end time, postman name/mobile phone, and so on. For this complex query, HBase's original KV query cannot meet the requirements. No matter how we design the row key, it cannot satisfy the arbitrariness of the query conditions. In addition, such queries may involve fuzzy queries with names, addresses, mobile phone numbers, and other conditions, which cannot be well satisfied by the HBase row key.

For another example, in a new retail service, it is necessary to carry out keyword queries on the product title or description. In HBase, this can only be implemented using fuzzy query, which isn't very well optimized. For services performing keyword queries on the title or description, it is more appropriate to use the word-breaking query, which can't be accomplished using HBase. In addition, in the new retail query service, in order to improve the user experience, it is often necessary to classify the query results. For example, in the e-commerce website, we search for the keyword "fashion", and these displayed products matched with this keyword are classified based on clothing, electronics, daily use and other categories, so that users can select the corresponding categories for secondary query and quickly find the desired products, thus improving the user experience. HBase cannot meet this requirement, either.

Finally, to adapt to the query characteristics of the HBase system, a compromise was made to the service. Only some KV query services were retained, and all other query services that could improve the user experience were cut.

In summary, we have listed several pain points encountered when using HBase for query service design:

  1. The query based on any combination of conditions cannot be satisfied.
  2. Fuzzy queries cannot be efficiently supported.
  3. Word-breaking queries are not supported.
  4. Multi-dimensional sorting/paging cannot be efficiently supported.
  5. The query result set cannot be classified.

Full-Text Indexing Service of ApsaraDB for HBase

The full-text index service is designed to enhance HBase's query capability. This function not only provides HBase with powerful KV capability, but also enriches its query capability under complex conditions. Specifically, the following scenarios are abstracted:

  1. Arbitrary query with complex conditions
  2. Multi-dimensional sorting
  3. Complex conditional paging
  4. Word-breaking keyword Query
  5. Classification of matching result sets
  6. Common stats, such as min, max, avg, and sum

The full-text indexing service of ApsaraDB for HBase is easy to use. You only need to create an index in the DDL phase, and then automatically synchronize the data and index. The architecture is as follows:

1

Differences from Self-Built Architecture

2

In addition, several bugs exist in the self-built HBase + indexer + solr, resulting in data loss in this self-built architecture reported by many users. ApsaraDB for HBase has implemented many bugfixes and improvements to this.

How to Use the Full-Text Indexing Service

To use the full-text indexing service of ApsaraDB for HBase, just enable this service, create a simple index using DDL, and insert synchronization for unlimited management. You only need to pay attention to the subsequent queries to build feature-rich service queries using HBase API and Solr API. Let's briefly go through the entire process.

Enable Service

The "full-text indexing service" is a free extension service of ApsaraDB for HBase. For the ApsaraDB for HBase instances created after January 25, 2019, click the "full-text indexing service" details page on the left side in the console to enable the service, as follows:

3

The Solr access address and WebUI connection after the application are as follows:

4

The Solr zk address can be used to construct a cloud Solr client for access. This client comes with the load balance function. The access method of Solr WebUI is the same as that of the ApsaraDB for HBase WebUI. First, set the user password and whitelist. Then, click the link above directly to jump to Solr WebUI.

Create an Index

1.  Download the client tool for index management

wget http://public-hbase.oss-cn-hangzhou.aliyuncs.com/installpackage/solr-7.3.1-ali-1.0.tgz
tar zxvf solr-7.3.1-ali-1.0.tgz

2.  Modify ZK_HOST in the solr-7.3.1-ali-1.0/bin/solr.in.sh file, as follows:

ZK_HOST=zk1:2181,zk2:2181,zk3:2181/solr

The zk address is the Solr zk access address after the full-text indexing service is enabled on the console in the preceding figure.

3.  Create an HBase table and enable the replication synchronization

create  'solrdemo',{NAME=>'info',  REPLICATION_SCOPE=> '1'}

4.  Create a Solr table "democollection"

Step 1: Modify and upload solrconfig.xml/schema. If no modification is needed, the demo config can be used for uploading, as follows:

solr-7.3.1-ali-1.0/bin/solr zk upconfig -d _democonfig  -n democollection_config -z zk1:2181/solr

Step 2: Use the uploaded configuration to create the "democollection", as follows:

curl "http://hostname:8983/solr/admin/collections?action=CREATE&name=democollection&numShards=1&replicationFactor=1&collection.configName=democollection_config"

The hostname can be replaced with the zk hostname with the master3-1 infix.

5.  Configure the field mapping index relationship from the HBase "solrdemo" table to the Solr "democollection" table

Step 1: Edit index_conf.xml to configure the mapping relationship. For example:

<? xml version="1.0"? >
<indexer table="solrdemo">
<field name="name_s" value="info:q2" type="string"/>
<field name="age_i" value="info:q3" type="int"/>
<param name="update_version_l" value="true"/>
</indexer>

The configuration describes that the info:q2 and info:3 columns in the HBase "solrdemo" table are mapped to the name_s and age_i fields in Solr "democollection" table, respectively. In addition, it specifies that the info:q2 column is parsed as the string and saved to the name_s field, and the info:q3 column is parsed as an int and saved to age_i field. The types of name_s and age_i of the Solr collection are determined based on the Solr collection configuration. By default, it is inferred dynamically, that is, the type is determined and stored based on the name suffix of the collection field. Common types, such as _i, _s, _l, _b, _f, and _d, correspond to int/string/long/boolean/float/double respectively. Users can also directly specify the field type. The last update_version_l is written in a fixed way and saves the latest update time at the document level.

Step 2: Use the tool to set index_conf.xml to associate the index mapping relationship between the HBase table "solrdemo" and the Solr table "democollection". The command is as follows:

solr-7.3.1-ali-1.0/bin/solr-indexer add  \
     -n demoindex  \
     -f indexer_conf.xml  \
     -c democollection

At this point, we have completed the relational mapping of indexes, and then insert HBase normally. We do not need to worry about index synchronization. The corresponding fields of the HBase "solrdemo" table will be automatically synchronized to the corresponding fields of the Solr "democollection" table. The above example maps as follows:

5

The row key of the HBase table is mapped to the id field in the Solr table.

Querying and Retrieval

The query is simple and fully compatible with the operations of the open-source HBase API and Solr API. You can use Solr to perform condition queries based on the service. In the result set, the id field contains all qualified HBase row keys. We only need to convert the id into the row key and use HBase API to read the original data belonging to this row. The flow chart is roughly as follows:

6

Conclusion

At the time of writing, ApsaraDB for HBase is available only for Mainland China accounts. To learn more about the product, visit https://cn.aliyun.com/product/hbase

With the full-text indexing service, ApsaraDB for HBase users can experience the following benefits:

  • The tool for index management is simpler and easier to use.
  • SQL portal provides access to the full-text indexing service.
  • The next-generation replication mechanism of the full-text engine is more efficient.
  • In addition to asynchronous indexes, synchronous indexes will also be supported in the future.
0 0 0
Share on

ApsaraDB

445 posts | 93 followers

You may also like

Comments