By Zhu Zhao
Redis-full-check is a tool from the Alibaba Cloud Redis & MongoDB team that checks data consistency between two Redis databases and is usually used to check the correctness after Redis data migration (redis-shake).
redis-full-check
performs data verification by conducting a full comparison of the data between the source side and the target side in Redis. This comparison is performed by using the multi-round comparison method: The data from the source side and the target side is fetched for comparing the data differences and inconsistent data is recorded (in sqlite3 db) for the next-round comparison. After multiple rounds of comparison, data is continuously converged to reduce data inconsistency between the source database and the target database due to incremental data synchronization. The final data in sqlite is the final data differences.
The comparison conducted by redis-full-check is unidirectional: redis-full-check fetches data from source database A and checks if the data in A is also present in database B. It will not conduct reverse detection. That is, it checks whether the target database is a subset of the source database. If you want a bidirectional comparison, you need to compare data twice. The first comparison uses A as the source database and B as the target database. The second comparison uses B as the source database and A as the target database.
The following is the basic data flow diagram. redis-full-check
uses the multi-round comparison, as shown in the yellow box. For each comparison, keys are fetched. In the first-round comparison, keys are fetched from the source database and the subsequent rounds of comparison fetch keys from sqlite3 db. After keys are fetched, the corresponding field and value of a key are fetched for comparison. Inconsistent data is stored in sqlite3 db for the next round of comparison.
Redis-full-check divides data inconsistency into two types: key inconsistency and value inconsistency.
Key inconsistency falls into the following subtypes:
lack_target
: A key exists in the source database but does not exist in the target database.type
: A key exists both in the source database and the target database, but the type is inconsistent.value
: A key exists in both the source database and the target database and is of the same type, but the value is inconsistent.Different data types have different comparison criteria:
The field conflict type falls into the following cases (only applicable to keys of types hash, set, zset, and list ):
lack_source
: A field exists in a source-side key but not in a target-side key.lack_target
: A field does not exist in a source-side key, but the field exists in a target-side key.value
: A field exists both in a source-side key and a target-side key, but the values of the two fields are different.Three compare modes (comparemode
) are available:
The number of comparison rounds is determined by comparetimes
(comparetimes
is set to 3 by default):
lack_source
, lack_target
, and type
), re-fetch keys and values from the source and the target databases for comparison.string
that have inconsistent values
, compare these keys again: Fetch keys and values from the source and target databases.hash
, set
, and zset
that have inconsistent values
, only re-compare inconsistent fields. Fields that have been compared and are found to be consistent do not need to be compared again. This prevents big keys from always failing the verification if updates are frequently performed.list
that have inconsistent values
, re-compare keys: Fetch keys and values from the source and target values.interval
between two rounds of comparison.For big keys of hash
, set
, zset
, and list
, follow these rules:
hgetall
, smembers
, zrange 0 -1 withscores
, and lrange 0 -1
.hscan
, sscan
, zscan
, and lrange
to batch-fetch fields and values.The following are the main parameters in redis-full-check:
-s, --source=SOURCE the source Redis database address (ip:port)
-p, --sourcepassword=Password the password of the source Redis database
--sourceauthtype=AUTH-TYPE the management permission of the source database (This parameter is not required in open-source Redis.)
-t, --target=TARGET the target Redis database address (ip:port)
-a, --targetpassword=Password the password of the target Redis database
--targetauthtype=AUTH-TYPE the management permission of the target database (This parameter is not required in open-source Redis.)
-d, --db=Sqlite3-DB-FILE the location in sqlite3 db where inconsistent keys are stored (result.db by default)
--comparetimes=COUNT comparison rounds
-m, --comparemode= comparison mode
--id= used for identifying metrics
--jobid= used for identifying metrics
--taskid= used for identifying metrics
-q, --qps= QPS speed threshold
--interval=Second time interval between two comparison rounds
--batchcount=COUNT the amount of batch-aggregated data
--parallel=COUNT the number of parallel coroutines (5 by default)
--log=FILE log file
--result=FILE inconsistent results are recorded in the result file in this format: "db diff-type key field"
--metric=FILE metric file
-v, --version
For example, the source Redis database is 10.1.1.1:1234
and the target database is 10.2.2.2:5678
:
./redis-full-check -s 10.1.1.1:1234 -t 10.2.2.2:5678 -p mock_source_password -a mock_target_password --metric metric --log log --result result
The metric information uses the following format:
type Metric struct {
DateTime string `json:"datetime"` // time format: 2018-01-09T15:30:03Z
Timestamp int64 `json:"timestamp"` // second-level unix timestamp
Id string `json:"id"` // run id
CompareTimes int `json:"comparetimes"` // comparison rounds
Db int32 `json:"db"` // db id
DbKeys int64 `json:"dbkeys"` // the total number of keys in the db
Process int64 `json:"process"` // progress percentage
OneCompareFinished bool `json:"has_finished"` // indicates if this comparison has finished
AllFinished bool `json:"all_finished"` // indicates if all comparisons have finished
KeyScan *CounterStat `json:"key_scan"` // the number of scanned keys
TotalConflict int64 `json:"total_conflict"` // total conflicts, including keys + fields
TotalKeyConflict int64 `json:"total_key_conflict"` // total key conflicts
TotalFieldConflict int64 `json:"total_field_conflict"` // total field conflicts
// For the two following maps, the first-layer key is of type string, including string, hash, list, set, and zset. The second key is the conflict types, including type, value, lack source, lack target, and equal.
KeyMetric map[string]map[string]*CounterStat `json:"key_stat"` // key metric
FieldMetric map[string]map[string]*CounterStat `json:"field_stat"` // field metric
}
type CounterStat struct {
Total int64 `json:"total"` // total
Speed int64 `json:"speed"` // speed
}
Results will be saved in the sqlite3 db file. If no file is specified, the result.db file under the current directory is used. If a third comparison round exists, the three following files are present: result.db. 1
, result.db. 2
, and result.db. 3.
key
: saves inconsistent keysfield
: saves inconsistent fields of hash, set, zset, and list. The list saves subscript values.key_id
field in the table field
is associated with the id field in the table key.key_<N>
and field_<N>
: save the results after the N comparison round (that is, the intermediate results).Example:
$ sqlite3 result.db
sqlite> select * from key;
id key type conflict_type db source_len target_len
---------- --------------- ---------- ------------- ---------- ---------- ----------
1 keydiff1_string string value 1 6 6
2 keydiff_hash hash value 0 2 1
3 keydiff_string string value 0 6 6
4 key_string_diff string value 0 6 6
5 keylack_string string lack_target 0 6 0
sqlite>
sqlite> select * from field;
id field conflict_type key_id
---------- ---------- ------------- ----------
1 k1 lack_source 2
2 k2 value 2
3 k3 lack_target 2
Here are some reference Materials for the open-source project:
Data migration tool redis-shake
Feel free to post your problems or suggestions in Issues on GitHub. You are welcome to join our open-source project development.
How Does PolarDB Help Baison Software Solve Database Challenges at Peak Times?
Pushing the Boundaries of Technology: Tianchi PolarDB Database Competition
Alibaba Clouder - September 1, 2020
Alibaba F(x) Team - September 10, 2021
Alibaba Clouder - December 25, 2020
ApsaraDB - November 13, 2019
Alibaba Cloud Community - March 29, 2022
Amuthan Nallathambi - October 14, 2024
A key value database service that offers in-memory caching and high-speed access to applications hosted on the cloud
Learn MoreAn on-demand database hosting service for MySQL with automated monitoring, backup and disaster recovery capabilities
Learn MoreAn on-demand database hosting service for PostgreSQL with automated monitoring, backup and disaster recovery capabilities
Learn MoreApsaraDB RDS for MariaDB supports multiple storage engines, including MySQL InnoDB to meet different user requirements.
Learn MoreMore Posts by ApsaraDB