TairCpc is a data structure developed based on the compressed probability counting (CPC) sketch. It supports high-performance computing on sampled data with a small memory footprint.
Background information
In real-time decision-making scenarios that involve big data, the real-time computing system processes incoming business logs, the online storage system stores the processing results, and then the real-time rule-based or decision-making system makes decisions. Sample scenarios:
Prevention and control of credit card fraud: In this scenario, your systems must determine whether a credit card is used in a safe environment and stop suspicious transactions at the earliest opportunity.
Prevention and control of ticket scalping: In this scenario, your systems must identify and stop activities in real time that use virtual devices and fake IP addresses to undermine platform interests.
In this case, you can use TairCpc to deduplicate real-time data by dimension and structurally store the data in Tair databases. These operations allow fast access to data and the integration of storage and computing. TairCpc also supports multiple aggregation operations to allow data to be aggregated within nanoseconds and provide real-time risk control.
Overview
CPC is a high-performance data deduplication algorithm that counts different values as data streams. It allows you to combine data blocks and deduplicate the blocks to obtain a total number. For more information about CPC, see Back to the Future: an Even More Nearly Optimal Cardinality Estimation Algorithm. CPC achieves the same level of accuracy as HLL with about 40% less memory.
Developed based on open source CPC, TairCpc reduces the error rate to 0.008%, as opposed to 0.67% of open source CPC and 1.95% of HLL.
Main features
Low memory usage, incremental reads and writes, and minimal I/O
High-performance and ultra-high-accuracy deduplication
Reduced stable error rate
Typical scenarios
Security systems for banks
Flash sales
Prevention and control of ticket scalping
Prerequisites
The instance is of one of the following Tair series types:
DRAM-based instance whose minor version is 1.7.20 or later
Persistent memory-optimized instance whose minor version is 1.2.3.3 or later
The latest minor version provides more features and higher stability. We recommend that you update the instance to the latest minor version. For more information, see Update the minor version of an instance. If your instance is a cluster instance or read/write splitting instance, we recommend that you update the proxy nodes in the instance to the latest minor version to ensure that all commands can be run as expected.
Precautions
The TairCpc data that you want to manage is stored on a Tair instance.
Supported commands
Table 1. TairCpc commands
Command | Syntax | Description |
| Adds an item to the specified TairCpc key. If the key does not exist, the key is created. If the item already exists in the key, the item is not added. | |
| Retrieves the cardinality estimate of the specified TairCpc key after deduplication. The return value is of the DOUBLE type, but you can ignore the decimals and round it to the nearest integer. | |
| Adds an item to the specified TairCpc key and returns the new cardinality estimate of the key after the update. If the key does not exist, the key is created. | |
| Adds an item to the specified TairCpc key and returns the new cardinality estimate of the key after the update and the difference between the original and new estimates. If the item is added and no duplication exists, a difference of 1 is returned. If the item already exists, a difference of 0 is returned. If the key does not exist, the key is created. | |
| Adds an item to the time window to which the specified timestamp belongs in the specified TairCpc key. If the key does not exist, the key is created. Note For example, if you want to calculate the amount of data in the key that was generated per minute during the last 10 minutes, you can set | |
| Retrieves the cardinality estimate of the specified TairCpc key within the time window to which the specified timestamp belongs. | |
| Retrieves the cardinality estimates of the specified TairCpc key across the time windows within the specified time range. The time range is a closed interval. | |
| Retrieves the cardinality estimate of the specified TairCpc key after merging and deduplication from a specific point in time to the Nth time window backward. N is the value of the range parameter. | |
| Adds an item to the time window to which the specified timestamp belongs in the specified TairCpc key and returns the new cardinality estimate of the key within the time window after the update. If the key does not exist, the key is created. This command creates the key by using parameters consistent with those used for the CPC.ARRAY.UPDATE command. | |
| Adds an item to the time window to which the specified timestamp belongs in the specified TairCpc key and returns the new cardinality estimate of the key within the time window after the update and the difference between the original and new estimates. If the item is added and no duplication exists, a difference of 1 is returned. If the item already exists, a difference of 0 is returned. If the key does not exist, the key is created. This command creates the key by using parameters consistent with those used for the CPC.ARRAY.UPDATE command. | |
| Deletes one or more TairCpc keys. |
The following list describes the conventions for the command syntax used in this topic:
Uppercase keyword
: indicates the command keyword.Italic text
: indicates variables.[options]
: indicates that the enclosed parameters are optional. Parameters that are not enclosed by brackets must be specified.A|B
: indicates that the parameters separated by the vertical bars (|) are mutually exclusive. Only one of the parameters can be specified....
: indicates that the parameter preceding this symbol can be repeatedly specified.
CPC.UPDATE
Parameter | Description |
Syntax |
|
Time complexity | O(1) |
Command description | Adds an item to the specified TairCpc key. If the key does not exist, the key is created. If the item already exists in the key, the item is not added. |
Parameter |
|
Output |
|
Example | Sample command:
Sample output:
|
CPC.ESTIMATE
Parameter | Description |
Syntax |
|
Time complexity | O(1) |
Command description | Retrieves the cardinality estimate of the specified TairCpc key after deduplication. The return value is of the DOUBLE type, but you can ignore the decimals and round it to the nearest integer. |
Parameter |
|
Output |
|
Example | Sample command:
Sample output:
|
CPC.UPDATE2EST
Parameter | Description |
Syntax |
|
Time complexity | O(1) |
Command description | Adds an item to the specified TairCpc key and returns the new cardinality estimate of the key after the update. If the key does not exist, the key is created. |
Parameter |
|
Output |
|
Example | Sample command:
Sample output:
|
CPC.UPDATE2JUD
Parameter | Description |
Syntax |
|
Time complexity | O(1) |
Command description | Adds an item to the specified TairCpc key and returns the new cardinality estimate of the key after the update and the difference between the original and new estimates. If the item is added and no duplication exists, a difference of 1 is returned. If the item already exists, a difference of 0 is returned. If the key does not exist, the key is created. |
Parameter |
|
Output |
|
Example | Sample command:
Sample output:
|
CPC.ARRAY.UPDATE
Parameter | Description |
Syntax |
|
Time complexity | O(1) |
Command description | Adds an item to the time window to which the specified timestamp belongs in the specified TairCpc key. If the key does not exist, the key is created. Note For example, if you want to calculate the amount of data in the key that was generated per minute during the last 10 minutes, you can set |
Parameter |
|
Output |
|
Example | Sample command:
Sample output:
|
CPC.ARRAY.ESTIMATE
Parameter | Description |
Syntax |
|
Time complexity | O(1) |
Command description | Retrieves the cardinality estimate of the specified TairCpc key within the time window to which the specified timestamp belongs. |
Parameter |
|
Output |
|
Example | Sample command:
Sample output:
|
CPC.ARRAY.ESTIMATE.RANGE
Parameter | Description |
Syntax |
|
Time complexity | O(1) |
Command description | Retrieves the cardinality estimates of the specified TairCpc key across the time windows within the specified time range. The time range is a closed interval. |
Parameter |
|
Output |
|
Example | Sample command:
Sample output:
|
CPC.ARRAY.ESTIMATE.RANGE.MERGE
Parameter | Description |
Syntax |
|
Time complexity | O(1) |
Command description | Retrieves the cardinality estimate of the specified TairCpc key after merging and deduplication from a specific point in time to the Nth time window backward. N is the value of the range parameter. |
Parameter |
|
Output |
|
Example | Sample command:
Sample output:
|
CPC.ARRAY.UPDATE2EST
Parameter | Description |
Syntax |
|
Time complexity | O(1) |
Command description | Adds an item to the time window to which the specified timestamp belongs in the specified TairCpc key and returns the new cardinality estimate of the key within the time window after the update. If the key does not exist, the key is created. This command creates the key by using parameters consistent with those used for the CPC.ARRAY.UPDATE command. |
Parameter |
|
Output |
|
Example | Sample command:
Sample output:
|
CPC.ARRAY.UPDATE2JUD
Parameter | Description |
Syntax |
|
Time complexity | O(1) |
Command description | Adds an item to the time window to which the specified timestamp belongs in the specified TairCpc key and returns the new cardinality estimate of the key within the time window after the update and the difference between the original and new estimates. If the item is added and no duplication exists, a difference of 1 is returned. If the item already exists, a difference of 0 is returned. If the key does not exist, the key is created. This command creates the key by using parameters consistent with those used for the CPC.ARRAY.UPDATE command. |
Parameter |
|
Output |
|
Example | Sample command:
Sample output:
|