Why are unexpected query results returned - Lindorm - Alibaba Cloud Documentation Center

If you use LindormTable in an improper way, query results that do not meet your expectations may be returned. This topic describes the common causes for unexpected query results to help you troubleshoot query issues and optimize query conditions.

Issue

LindormTable is a NoSQL data engine that is compatible with Apache HBase. The storage model of LindormTable is implemented based on Log-Structured Merge Tree (LSM-Tree). When you perform write operations on LindormTable, data is first written into Write-Ahead Log (WAL) logs before being committed to the database. If no errors are reported during the write operation, the data is successfully written. In this case, data can be restored based on WAL logs in scenarios such as server failures. This ensures the durability of data and prevents the written data from being unexpectedly invisible. However, LindormTable provides various complex features, such as the versions, timestamps, and TTLs of data. If these features are not properly configured, expected issues may occur. For example, written data cannot be overwritten as expected, data is deleted due to expiration, written data cannot be queried, or the returned query results do not meet your expectations. These issues occur due to the following common causes:

Data is not successfully written or the query is performed before the data is written
The STRING field contains stop characters or invisible characters
Incorrect column names are specified in the query conditions
The queried data has expired based on the TTL configured for the table
The queried data has expired based on the configured cell TTL
Improper timestamps are specified in the deletion request
Data is deleted immediately after being written due to link issues
Data is deleted because the VERSIONS attribute of the table is set to 0
Data in an immutable table is updated

Common causes

Data is not successfully written or the query is performed before the data is written

In LindormTable, data is invisible immediately after it is written.

Lindorm wide tables are commonly used in data links involved in big data scenarios. If the data write link fails, data may fail to be written to the wide table or the write operation may be delayed. In this case, the data that you want to query may not be found because the data has not been written to the table.

If data cannot be queried until it has been written to the table for a period of time, we recommend that you add hints in the query condition to obtain the timestamps of the queried data in the result. Then, you can check whether the query is performed before the data is written based on the timestamps. For more information about how to obtain timestamps in query results, see Use hints to implement data versioning.

Note

If no timestamps are specified for a row of data when the data is written to the table, the returned timestamp indicates the time when the row of data is written to the table.

The STRING field contains stop characters or invisible characters

When you query fields of the STRING data type, if the field contains invisible characters, unexpected query results may be returned. For example, the orderID column of a table contains data of the STRING type and an invisible character is accidentally appended to the value 1000 of a field in the column due to program bugs. The actual value of the field becomes 1000 (invisible character).

In this case, if you specify the where oderID="1000" condition in a query, the field cannot be queried because its value does not match the condition. You can specify the where orderID > "1000" limit 1 condition in the query to check whether this kind of issues occur.

In addition, Lindorm does not support stop characters in the middle of a STRING value. If you write a string "1000\stop character\1000" to a table, encoding exceptions may occur and the string cannot be queried.

Incorrect column names are specified in the query conditions

Unexpected query results may be returned if the case of column names is incorrect or the column family is not specified in the query conditions.

The case of the specified column names is incorrect: Column names in Lindorm are case-sensitive. Therefore, make sure that the column specified in the query conditions are consistent with the actual column names.
The column family is not specified: Lindorm wide tables support multiple column families. If you do not specify a column family when you create a table, the columns of the table are automatically added to a column family named f by default. In this case, you do not need to specify the column family in the query conditions. However, if you configure multiple column families for a table, you must specify column families in the query conditions when you query data in the table. If you do not specify a column family in the conditions, only data in the default column family f is queried. In this case, the returned result may do not meet your expectations. For example, you create a column family named meta and add a column named column1 to the column family. When you query data in column1, you must specify the column family meta in the query conditions in the following format: where meta:column1=xxx. If you do not specify the column family meta in the query condition (such as where column1=xxx), Lindorm searches for column1 only in the default column family f. In this case, the returned result may do not meet your expectations.

The queried data has expired based on the TTL configured for the table

LindormTable allows you to specify a data time-to-live (TTL) in seconds for a table. In addition, you can specify a timestamp in milliseconds when you write the data to a table.

If you do not specify a timestamp when you write data to a table, the data expires when it has been stored for a period longer than the specified TTL. For example, if you set the TTL of a table to 86400 seconds (which equals to 1 day), the data that is written to the table today expires tomorrow and cannot be queried.

If you specify an earlier timestamp when you write data and the difference between the timestamp and the current time is greater than the specified TTL parameter, the data may be deleted immediately after it is written to the table. As a result, the data cannot be queried.

Important

In LindormTable, the version number of a key-value pair is equivalent to a timestamp. If you specify small values (such as 1, 2, and 3) as custom version numbers, data is prone to be cleared based on the specified TTL. Similarly, if you specify a large value, such as a timestamp in microseconds or nanoseconds instead of in milliseconds, as the custom timestamp or version number, the data cannot be cleared as expected based on the specified TTL.

You can check the TTL of a table in the cluster management system by performing the following steps: Log on to the cluster management system. On the Overview page, click the name of the table whose TTL you want to check. In the Current table details section, click View table properties. On the page that appears, check the value of TTL. For more information about how to log on to the cluster management system, see Log on to the cluster management system.

You can use Lindorm Shell or execute the ALTER TABLE statement to modify the TTL value. For more information, see the following documentations:

The queried data has expired based on the configured cell TTL

LindormTable allows you to configure a TTL in milliseconds for a key-value pair. The TTL is also referred as cell TTL.

If you configure a cell TTL for a key-value pair, the time when the key-value pair expires can be calculated based on the following formula: min{Expiration time based on cell TTL, Expiration time based on table TTL}. The expiration time of a key-value pair is calculated based on the timestamp or version number of the key-value pair. If the timestamp or version number is too small or too large, the key-value-pair may be cleared earlier than the expected time or cannot be cleared as expected. In this case, the returned query results may do not meet your expectations.

For example, a column contains two key-value pairs named KV1 and KV2. A cell TTL is configured for KV1 and no cell TTL is configured for KV2. If you query data in the column before KV1 and KV2 expire, KV1 whose timestamp is updated may be returned. If you query data in the column after KV1 expires based on its cell TTL, KV1 is cleared and KV2 is returned, which does not meet your expectations.

Note

Whether KV2 can be queried depends on the VERSIONS attribute of the table, whether KV1 and KV2 are stored in different files, and whether KV1 and KV2 are merged by major compaction operations. For example, if you set the VERSIONS attribute of a table to 1, KV2 is deleted after the major compaction operation because only one version of data is retained based on the VERSIONS attribute. In this case, KV2 cannot be queried even if KV1 is cleared after expiration.

Improper timestamps are specified in the deletion request

LindormTable allows you to configure a timestamp or version number in a deletion request. Data that is written to the column or row earlier than the time specified by the timestamp or version number is deleted. If you do not configure a timestamp or version number in the request, data that is written to the column or row earlier than the current time or version is deleted. For example, the current time is 16:00:00, January 16, 2024. If you execute the DELETE FROM sensor WHERE p1 = 10; statement to delete a row that is written before the current time and whose value in the p1 column is 10.

If the time when a row of data is written to the table is later than the time specified by the timestamp in the deletion request, the row is not deleted. In this case, this row is contained in the query results when you query the data in the table.

If the timestamp or version number specified in the deletion request is large, the timestamp of data that is written after the deletion operation is still smaller than the timestamp or version number in the request. In this case, the data is deleted immediately after it is written and cannot be queried.

Note

Timestamps cannot be specified by using SQL statements.

Data is deleted immediately after being written due to link issues

In data links involved in some big data scenarios, if the write and delete operations are performed in different programs or processes, data may be deleted immediately after it is written.

In addition, if you use an earlier version of Lindorm connector provided by Realtime Compute for Apache Flink to update a Lindorm wide table, note the issue that a write operation may be overwritten by a delete operation that is performed if the two operations are performed at almost the same time. You can resolve this issue by configuring ignoreDelete=true in Flink.

For more information, see Lindorm connector.

Data is deleted because the VERSIONS attribute of the table is set to 0

If the value of the VERSIONS attribute of a wide table is set to 0, data in the table is not retained. Any data written to the table is immediately deleted and cannot be queried. If you do not configure the VERSIONS attribute for a table when you create the table, the value of this attribute is 1 by default, which indicates that only one version of data is retained in the table. For more information, see Use hints to implement data versioning.

If you set the VERSIONS attribute to 0 by mistake, we recommend that you delete the current table and create a new table, or change the VERSIONS attribute to a value greater than or equal to 1.

You can check the value of the VERSIONS attribute of a table in the cluster management system by performing the following steps: Log on to the cluster management system. On the Overview page, click the name of the table that you want to check. In the Current table details section, click View table properties. On the page that appears, check the value of VERSIONS. For more information about how to log on to the cluster management system, see Log on to the cluster management system.

You can use Lindorm Shell or execute the ALTER TABLE statement to modify the value of VERSIONS attribute. For more information, see the following documentations:

Note

The MIN_VERSIONS attribute specifies the minimum number of versions that can be retained in a table and is set to 0 by default. This attribute does not cause the issue that data is deleted and cannot be queried after it is written to the table.

Data in an immutable table is updated

If the MUTABILITY attribute of a table is set to IMMUTABLE, you can write a row of data to the table only by executing a single UPSERT statement. Data written to the table is not allowed to be updated or deleted. However, Lindorm does not strictly prohibit the update and delete operations performed on an immutable table. Therefore, such operations may cause data inconsistency between the index table and the base table. As a result, the data that hits the query conditions is different in the index table and base table.

In this case, we recommend that you rebuild the index table and stop updating or deleting data in the immutable table.