Traces are records of the interactions between various services within a distributed system. They are generated when one service invokes another, illustrating the hierarchical and sequential relationships of service calls, aiding developers in tracking the execution of code.
The role of traces
In distributed systems, a single request may involve multiple service calls. When issues such as request timeouts, errors, or exceptions arise, pinpointing the cause can be challenging. Traces offer several advantages for operations and maintenance (O&M) personnel:
-
Troubleshooting: Traces provide a complete view of the request path and the status of each service involved, enabling O&M teams to quickly identify and resolve issues.
-
Performance Optimization: By analyzing the running times of requests, traces help identify system bottlenecks and guide performance enhancements.
-
System Monitoring: Traces facilitate real-time monitoring and analysis, offering insights into system health and resource usage.
Common terms
Trace
A trace is the record of an entire request or transaction from start to finish. For instance, a trace might encompass the full cycle of a client's request from reception to processing. It is structured as a tree and includes multiple spans, each with a unique identifier that remains consistent throughout the request's lifecycle. The trace ID is a key tool for tracing or debugging, as it allows retrieval of all pertinent information.
Span
A span is the fundamental unit of measurement in distributed tracing, representing a single operation within a trace, such as a method invocation, a block execution, or a database access. Each span is assigned an ID and records the start and end times of the operation. Additionally, spans include parent span IDs, which represent the spans that directly precede them in the trace hierarchy. A trace is composed of these interconnected parent and child spans.