By Tian Si
ApsaraDB for HBase has published the full-text indexing service. For ApsaraDB for HBase instances created after January 25, 2019, the full-text indexing service can be enabled free of charge on the console. Using this service, users can build more feature-rich search services on HBase, without being limited to KV simple queries, worrying about designing various row keys, or fearing the ever-changing HBase complex query services. The "full-text indexing service" is designed for ApsaraDB for HBase to enhance query capabilities and automatically synchronize data, allowing users to focus on how to enrich their service architecture with powerful retrieval functions.
We've all experienced the problem of designing the row key when using HBase. However, regardless of how excellent our engineers are, sorting out and listing all the service retrieval requirements and cutting out and compromising various services, they still cannot design an all-round row key to meet all kinds of service query requirements.
For example, in a logistics management system, we need to query based on any combination of the following conditions: recipient name/mobile phone/address, sender name/mobile phone/address, waybill number/start time/end time, postman name/mobile phone, and so on. For this complex query, HBase's original KV query cannot meet the requirements. No matter how we design the row key, it cannot satisfy the arbitrariness of the query conditions. In addition, such queries may involve fuzzy queries with names, addresses, mobile phone numbers, and other conditions, which cannot be well satisfied by the HBase row key.
For another example, in a new retail service, it is necessary to carry out keyword queries on the product title or description. In HBase, this can only be implemented using fuzzy query, which isn't very well optimized. For services performing keyword queries on the title or description, it is more appropriate to use the word-breaking query, which can't be accomplished using HBase. In addition, in the new retail query service, in order to improve the user experience, it is often necessary to classify the query results. For example, in the e-commerce website, we search for the keyword "fashion", and these displayed products matched with this keyword are classified based on clothing, electronics, daily use and other categories, so that users can select the corresponding categories for secondary query and quickly find the desired products, thus improving the user experience. HBase cannot meet this requirement, either.
Finally, to adapt to the query characteristics of the HBase system, a compromise was made to the service. Only some KV query services were retained, and all other query services that could improve the user experience were cut.
In summary, we have listed several pain points encountered when using HBase for query service design:
The full-text index service is designed to enhance HBase's query capability. This function not only provides HBase with powerful KV capability, but also enriches its query capability under complex conditions. Specifically, the following scenarios are abstracted:
The full-text indexing service of ApsaraDB for HBase is easy to use. You only need to create an index in the DDL phase, and then automatically synchronize the data and index. The architecture is as follows:
In addition, several bugs exist in the self-built HBase + indexer + solr, resulting in data loss in this self-built architecture reported by many users. ApsaraDB for HBase has implemented many bugfixes and improvements to this.
To use the full-text indexing service of ApsaraDB for HBase, just enable this service, create a simple index using DDL, and insert synchronization for unlimited management. You only need to pay attention to the subsequent queries to build feature-rich service queries using HBase API and Solr API. Let's briefly go through the entire process.
The "full-text indexing service" is a free extension service of ApsaraDB for HBase. For the ApsaraDB for HBase instances created after January 25, 2019, click the "full-text indexing service" details page on the left side in the console to enable the service, as follows:
The Solr access address and WebUI connection after the application are as follows:
The Solr zk address can be used to construct a cloud Solr client for access. This client comes with the load balance function. The access method of Solr WebUI is the same as that of the ApsaraDB for HBase WebUI. First, set the user password and whitelist. Then, click the link above directly to jump to Solr WebUI.
1. Download the client tool for index management
wget http://public-hbase.oss-cn-hangzhou.aliyuncs.com/installpackage/solr-7.3.1-ali-1.0.tgz
tar zxvf solr-7.3.1-ali-1.0.tgz
2. Modify ZK_HOST in the solr-7.3.1-ali-1.0/bin/solr.in.sh file, as follows:
ZK_HOST=zk1:2181,zk2:2181,zk3:2181/solr
The zk address is the Solr zk access address after the full-text indexing service is enabled on the console in the preceding figure.
3. Create an HBase table and enable the replication synchronization
create 'solrdemo',{NAME=>'info', REPLICATION_SCOPE=> '1'}
4. Create a Solr table "democollection"
Step 1: Modify and upload solrconfig.xml/schema. If no modification is needed, the demo config can be used for uploading, as follows:
solr-7.3.1-ali-1.0/bin/solr zk upconfig -d _democonfig -n democollection_config -z zk1:2181/solr
Step 2: Use the uploaded configuration to create the "democollection", as follows:
curl "http://hostname:8983/solr/admin/collections?action=CREATE&name=democollection&numShards=1&replicationFactor=1&collection.configName=democollection_config"
The hostname can be replaced with the zk hostname with the master3-1 infix.
5. Configure the field mapping index relationship from the HBase "solrdemo" table to the Solr "democollection" table
Step 1: Edit index_conf.xml to configure the mapping relationship. For example:
<? xml version="1.0"? >
<indexer table="solrdemo">
<field name="name_s" value="info:q2" type="string"/>
<field name="age_i" value="info:q3" type="int"/>
<param name="update_version_l" value="true"/>
</indexer>
The configuration describes that the info:q2 and info:3 columns in the HBase "solrdemo" table are mapped to the name_s and age_i fields in Solr "democollection" table, respectively. In addition, it specifies that the info:q2 column is parsed as the string and saved to the name_s field, and the info:q3 column is parsed as an int and saved to age_i field. The types of name_s and age_i of the Solr collection are determined based on the Solr collection configuration. By default, it is inferred dynamically, that is, the type is determined and stored based on the name suffix of the collection field. Common types, such as _i, _s, _l, _b, _f, and _d, correspond to int/string/long/boolean/float/double respectively. Users can also directly specify the field type. The last update_version_l is written in a fixed way and saves the latest update time at the document level.
Step 2: Use the tool to set index_conf.xml to associate the index mapping relationship between the HBase table "solrdemo" and the Solr table "democollection". The command is as follows:
solr-7.3.1-ali-1.0/bin/solr-indexer add \
-n demoindex \
-f indexer_conf.xml \
-c democollection
At this point, we have completed the relational mapping of indexes, and then insert HBase normally. We do not need to worry about index synchronization. The corresponding fields of the HBase "solrdemo" table will be automatically synchronized to the corresponding fields of the Solr "democollection" table. The above example maps as follows:
The row key of the HBase table is mapped to the id field in the Solr table.
The query is simple and fully compatible with the operations of the open-source HBase API and Solr API. You can use Solr to perform condition queries based on the service. In the result set, the id field contains all qualified HBase row keys. We only need to convert the id into the row key and use HBase API to read the original data belonging to this row. The flow chart is roughly as follows:
At the time of writing, ApsaraDB for HBase is available only for Mainland China accounts. To learn more about the product, visit https://cn.aliyun.com/product/hbase
With the full-text indexing service, ApsaraDB for HBase users can experience the following benefits:
Pushing the Boundaries of Technology: Tianchi PolarDB Database Competition
In-depth Analysis on HLC-based Distributed Transaction Processing
Alibaba Clouder - October 1, 2019
Alibaba Cloud Storage - February 27, 2020
Alibaba Cloud Storage - February 27, 2020
Alibaba Cloud Product Launch - January 22, 2019
Data Geek - April 8, 2024
ApsaraDB - June 4, 2020
ApsaraDB for HBase is a NoSQL database engine that is highly optimized and 100% compatible with the community edition of HBase.
Learn MoreApsaraDB RDS for MariaDB supports multiple storage engines, including MySQL InnoDB to meet different user requirements.
Learn MoreAn on-demand database hosting service for MySQL with automated monitoring, backup and disaster recovery capabilities
Learn MoreAn on-demand database hosting service for PostgreSQL with automated monitoring, backup and disaster recovery capabilities
Learn MoreMore Posts by ApsaraDB