Tair provides the observability that contains more dimensions, categories, and advanced features than open source Redis.
Background information
Observability is the ability to access monitoring data, analyze issues, and perform systematic diagnostics based on three pillars of data: metrics, traces, and logs.
Metrics: A metric is a numeric value of a dimension that is measured over a period of time to display specific states and trends of a system.
Logs: A log is a record of discrete events that happened during the runtime of an application.
Traces: A trace records the end-to-end lifecycle of a request.
Tair integrates metrics, traces, and logs to provide data analytics. The following table compares the observability of Tair, ApsaraDB for Redis, and open source Redis. The following list describes the symbols that are used in the table.
The ✔️ symbol indicates that the feature is supported.
The ❌ symbol indicates that the feature is not supported.
The ➖ symbol indicates that no features are involved.
Observability | Open source Redis | ApsaraDB for Redis | Tair | |
Metric | ✔️ | ✔️ (fine-grained) | ✔️ (fine-grained) | |
Log | ✔️ | ✔️ | ✔️ | |
✔️ | ✔️ | ✔️ | ||
❌ | ✔️ | ✔️ | ||
❌ | ✔️ | ✔️ | ||
Trace | ➖ | ➖ | ➖ | ➖ |
Analytics | ❌ | ✔️ | ✔️ | |
❌ | ✔️ | ✔️ | ||
❌ | ✔️ | ✔️ | ||
❌ | ✔️ | ✔️ |
Typically, tracing analysis requires a middleware or specific code modifications on your client.
Metrics
Open source Redis provides a variety of metrics, including memory-related metrics (such as memory distribution, memory usage, and memory fragmentation ratio), statistics-related metrics (such as the number of connections and commands, network traffic, and synchronization status), CPU utilization, and keyspace information. In addition to the metrics supported by open source Redis, Tair provides more fine-grained metrics, including read queries per second (QPS) and write QPS. For more information about these metrics, see Query monitoring data.
The fine-grained metrics provided by Tair also have the following benefits in implementing observability:
Real-time performance: displays metrics in real time.
Session management: displays sessions between an instance and clients in real time.
Performance trends: displays performance trends over a specific period of time.
Logs
Tair allows you to view active logs, slow logs, audit logs, and latency insights of an instance.
Run logs
Run logs of a Tair instance record in rows the persistence, synchronous replication, and debugging operations that take place and error messages that are returned when the instance is running.
You can go to the details page of an instance in the Tair console and choose View active logs.
in the left-side navigation pane to view the run logs of the instance. For more information, seeSlow logs
Slow logs record requests that take longer to execute than the threshold specified in Tair. The execution duration of a request does not include the amount of time that the request spends in queue or in transmission. Slow log statistics include execution timestamps, execution durations, command parameters, and client information. You can view slow logs of an instance, identify commands in the instance that take longer than required to run, and optimize these commands to prevent congestion.
You can go to the details page of an instance in the Tair console and choose Query slow logs.
in the left-side navigation pane to view the slow logs of the instance. For more information, seeAudit logs
Tair provides audit logs based on Log Service. For more information about Log Service, see What is Log Service? Audit logs include statistics such as log types, execution durations, database numbers, client IP addresses, account names, command details, and extension information. Audit logs allow you to search and analyze online operation logs (including logs about sensitive operations related to the
FLUSHALL
,FLUSHDB
, andDEL
commands), slow logs, and run logs, and export these logs.You can go to the details page of an instance in the Tair console and choose Enable the new audit log feature.
in the left-side navigation pane to view the audit logs of the instance. For more information, seeLatency insights
Tair provides the advanced latency insights feature. This feature can record up to 27 events and execution durations of all Tair commands, and save all latency statistics within the last three days.
You can go to the details page of an instance in the Tair console and choose Latency insights.
in the left-side navigation pane to view the latency insights of the instance. For more information, see
Analytics
Tair integrates metrics, traces, and logs to provide data analytics, which is a critical feature of Tair.
Hotkey and large key analysis
If a key receives significantly more requests than other keys, the key is considered a hotkey. If a hotkey is not handled in a timely manner, it may result in skewed requests or even cache breakdowns. If a key contains a large number of members or occupies a large amount of memory, the key is considered a large key. If a large key is not handled in a timely manner, commands that involve the key take longer to run and an out-of-memory (OOM) error may occur for the key.
You can use the Real-time Key Statistics feature to identify hotkeys and large keys. The Real-time Key Statistics feature displays hotkeys and large keys in real time and allows you to view hotkeys and large keys that were generated within the last four days. The Real-time Key Statistics feature is high precision and has minimal impact on performance. This feature allows you to view the amount of memory occupied by a key and the frequency at which a key is requested and troubleshoot hotkeys and large keys to optimize instances.
You can go to the details page of an instance in the Tair console and choose Use the real-time key statistics feature.
in the left-side navigation pane to view statistics about hotkeys and large keys of the instance. For more information, seeOffline key analysis
The Offline Key Analysis feature supports the processing of offline Redis Database (RDB) files of all data structures and from all instance architectures and Tair versions and does not affect online services provided by Tair. The Offline Key Analysis feature can process a combination of 10% large keys and 90% small keys four times faster than redis-rdb-tools, and a combination of medium keys and large keys 20 times faster than redis-rdb-tools. During the process, memory usage is kept within 1 GB to prevent OOM errors that may occur due to large key processing. The Offline Key Analysis feature also allows you to search for the longest subelement to troubleshoot issues.
You can go to the details page of an instance in the Tair console and choose Use the offline key analysis feature.
in the left-side navigation pane to view the offline key analysis of the instance. For more information, seeInstance diagnostics
Tair integrates statistics such as performance metrics, slow logs, and key analysis to provide the diagnostic reports feature. This feature performs one-stop diagnostics to evaluate the health of instances based on multiple metrics (such as performance metrics, skewed request statistics, and slow logs) and puts forward suggestions. This feature improves the automatic O&M capabilities of Tair instances and reduces instance usage costs.
You can go to the details page of an instance in the Tair console and choose Create a diagnostic report.
in the left-side navigation pane to perform diagnostics on the instance. For more information, see