Distributed Tracing

With distributed tracing, you can track software performance and measure throughput & latency, while seeing the impact of errors across multiple systems.

Distributed tracing provides a connected view of your application from frontend to backend. It helps track software performance, measure metrics like throughput and latency, and display the impact of errors across multiple systems. This makes Sentry a more complete performance monitoring solution, aiding in diagnosing problems and measuring your application's overall health.

Tracing in Sentry provides insights such as:

  • What occurred for a specific error event or issue
  • The conditions causing bottlenecks or latency issues
  • The endpoints or operations consuming the most time

A tracing tool focuses on what happened (and when), logging events during a program's execution across multiple systems. Traces often include timestamps, allowing durations to be calculated, but their purpose is broader, showing how interconnected systems interact and how problems in one can affect another. While tracing can be useful if instrumented in just the frontend or backend, it's most powerful when set up for your full stack (distributed tracing).

It's important to note that tracing is not profiling. A profiler measures various aspects of an application's operation, resulting in a statistical summary. While both diagnose application problems, they differ in what they measure and how data is recorded.

Learn more in this Tracing: Frontend issues with backend solutions workshop.

Ultimately, any data structure is defined by the kind of data it contains, and relationships between data structures are defined by how links between them are recorded. Traces, transactions, and spans are no different.

Traces are defined as the collection of all transactions that share a trace_id value.

Transactions share most properties (start and end time, tags, and so on) with their root spans. They also have a transaction_name property, used in the UI to identify the transaction. Common examples include endpoint paths for backend request transactions, task names for cron job transactions, and URLs for page-load transactions.

Before the transaction is sent, the tags and data properties will get merged with data from the global scope. (Global scope data is set in Sentry.init() or by using Sentry.configureScope(), Sentry.setTag(), Sentry.setUser(), or Sentry.setExtra().)

Span data includes:

  • parent_span_id: ties the span to its parent span
  • op: short string identifying the type or category of operation the span is measuring
  • start_timestamp: when the span was opened
  • end_timestamp: when the span was closed
  • description: longer description of the span's operation
  • status: short code indicating operation's status
  • tags: key-value pairs holding additional data about the span
  • data: arbitrarily-structured additional data about the span

An example use of the op and description properties together is op: db.query and description: SELECT * FROM users WHERE last_active < %s. The status property indicates the success or failure of the span's operation, or a response code for HTTP requests. Tags and data attach further contextual information to the span, such as function: middleware.auth.is_authenticated for a function call or request: {url: ..., headers: ... , body: ...} for an HTTP request. To search span data see Searchable Properties

Applications consist of interconnected components or services. For example, a modern web application may include:

  • Frontend (Single-Page Application)
  • Backend (REST API)
  • Task Queue
  • Database Server
  • Cron Job Scheduler

Each component can be instrumented individually using a Sentry SDK to capture error data or crash reports, but this doesn't provide the full picture. Distributed tracing ties all the data together.

Distributed tracing allows you to follow a request from the frontend to the backend and back, pulling in data from any background tasks or notification jobs that request creates. This helps correlate Sentry error reports and gain insights into which services may negatively impact your application's performance.

A trace represents the record of the entire operation you want to measure or track, like page load or a user action. When a trace includes work in multiple services, it's called a distributed trace.

Each trace consists of one or more tree-like structures called transactions, with nodes called spans. Each transaction represents a single instance of a service being called, and each span represents a single unit of work. Here's an example trace, broken down into transactions and spans: