- Distribution mode (always on): when no heatmap selection exists, every attribute’s value distribution is shown for the current span population. Useful for spotting dominant or unusually rare values (cardinality outliers).
- Comparison mode: drag a rectangle on the heatmap to compare the spans inside (Selection) against everything outside (Background). Useful for isolating deviations.
- Iterative drill-down: click any bar to filter (or exclude) on that value. The heatmap re-renders against the filtered population, so you can keep narrowing until the cause is obvious.
Prerequisites
Event deltas require a Trace data source with a duration expression. Any OpenTelemetry-instrumented service producing span data works. Available in all ClickStack deployments (Managed, Open Source, ClickHouse Cloud).Getting started
- From the Data Source dropdown, select a source that holds traces. Source names are arbitrary; what matters is that the source is configured as a Trace type. The Event Deltas tab is only enabled for such sources.
- In the Analysis Mode section, click the Event Deltas tab.
The heatmap
The heatmap plots spans across two dimensions:- X axis: time
- Y axis: a numeric value, defaulting to span duration in milliseconds (logarithmic scale)
Distribution mode: cardinality outliers
With no selection on the heatmap, the analysis panel shows one bar chart per attribute, computed across all matching spans. The legend reads All spans (visible in the overview screenshot above). Attributes are ranked by how concentrated their values are: those dominated by a few values appear first; uniform, high-entropy attributes are deprioritized. Use distribution mode when you want to understand the cardinality shape of your data:- Highs: which services, endpoints, status codes, or hosts dominate your span population? Often surfaces a single tenant, version, or route doing most of the traffic.
- Lows: values that occur but rarely. A status code that appears in just
0.5%of spans, or one host that barely shows up, can be the most interesting signal. The long tail is where regressions and bad actors hide.
Comparison mode: deviations from normal
Click and drag a rectangle on the heatmap to enter comparison mode. The selected spans become the Selection (orange bars); everything outside becomes the Background (green bars). Each attribute chart then shows both populations side by side, sorted so the attributes with the largest divergence appear first. A value present almost exclusively in one side, or absent from one side, is the strongest candidate for what differs. The shape of the rectangle you draw changes the question you’re asking. The two common shapes are described below.Use case 1: Before vs after a regression
When the heatmap shows latency drifting upward over the timeline (the slow band thickens, the bright band climbs, or a clear inflection point separates a healthy period from a degraded one), drag a rectangle from the climb inflection to the right edge of the window. To sharpen the comparison, set the bottom of the rectangle at the healthy baseline rather than at the bottom of the axis: this isolates the spans that are genuinely slower than normal in the degraded window, instead of dragging in still-healthy fast spans that happen to fall in the same time range. The attribute bars below the heatmap are sorted with the largest divergences first. In this example, the top-row charts surface the strongest signals:SpanKind, SpanName, and ScopeName each show a sharp orange-vs-green split between the slow Selection and the healthy Background. Read together, they fingerprint what changed at the inflection.
This is the right shape when you want to ask “what changed?” A tighter variant uses the same workflow: when a small knot of slow spans sits in an otherwise quiet band (a brief burst on the right edge, a cluster in the middle of a steady period), draw a small box around just that cluster instead. The shape changes the question: a vertical strip asks what changed in time; a small focused box asks what is special about this cluster.
Use case 2: Slow versus fast
When the heatmap shows two latency populations clearly separated on the duration axis, drag a wide rectangle that spans the entire time range but covers only the upper, cleanly-separated band. The slow population becomes the Selection; the fast bulk becomes the Background. Draw the rectangle tightly around the upper band, with a visible horizontal gap between it and the dense bulk. A loose rectangle that bleeds into the fast population washes out the divergence. The 100 s ceiling line is informative on its own: a constant horizontal line at a round number is the signature of a fixed timeout. If no span attribute differentiates the two populations cleanly, that’s a useful result too: it points you to host- and runtime-level metrics (GC pauses, I/O contention, scheduler latency, cold-cache effects, noisy neighbors) rather than to span attributes. This is the right shape when you want to ask “what makes the slow spans different from the fast ones?” rather than chasing a specific anomaly. A divergent attribute points at a code-path or input cause; a flat comparison points at a systemic one.Iterative drill-down
Comparison and distribution modes are most powerful when chained. Click any bar to open a popover with three actions:- Filter: keep only spans with this value
- Exclude: remove spans with this value
- Copy: copy the value to the clipboard
Aggregated Other (N) buckets that collapse low-frequency values aren’t clickable. To filter for a specific value within that bucket, use the search bar directly.
Customize the heatmap
The gear icon in the top-right of the heatmap opens the Display Settings drawer.| Parameter | Default | Description |
|---|---|---|
| Scale | Log | Log handles wide latency ranges; Linear is better for narrow, uniform distributions. |
| Value | (Duration)/1e6 | Any numeric expression: response size, error rate, a custom span attribute. |
| Count | count() | Aggregation for color. Switch to avg(), sum(), p95(), or expressions like countDistinct(field). |
- Switch Scale to Linear when the latency band is narrow (for example, a service whose spans all run between 5 and 50 ms). Log scale wastes vertical range on the upper end where there is no data.
- Plot something other than duration on the Y axis. Setting Value to
SpanAttributes.http.response.sizelets you investigate slow large responses; an expression likeif(StatusCode = 'Error', 1, 0)plots error frequency over time across services. - Color by something other than count. Setting Count to
p95(Duration)colors each bucket by tail latency rather than volume, surfacing rare-but-slow pockets that a count-based view washes out.countDistinct(TraceId)distinguishes trace volume from span volume when one trace produces many spans.
Tips for effective use
A few practices make Event deltas substantially more useful:- Filter to a single service first. Latency varies widely across services and mixing them obscures the signal. Use the search bar to narrow to one
ServiceName(or one endpoint) before you start, so the heatmap and distributions reflect a comparable population. - Pick selections with clear visual contrast. Comparison mode works best when the Selection band is visibly distinct from the Background, for example a degraded period that begins at a recognizable moment, or a slow tail clearly separated from the bulk. Selections that overlap heavily with the rest of the data tend to surface noise rather than the actual deviation.
- Iterate filter, heatmap, filter. A single selection rarely identifies the cause. Treat the first comparison as a hypothesis, filter on the most divergent value, and re-read the new heatmap and distributions. Two or three iterations usually narrow a regression to one or two attributes.
- Use distribution mode without a selection when no contrast is yet visible (you know there is an issue but the heatmap looks uniform). Apply a hypothesis filter such as only error spans, only client spans, or only one endpoint, and let the attribute distributions point you at the highest-impact values before you draw any rectangle.