03-20-2026 02:24 PM - edited 03-20-2026 03:10 PM
Jump to
This is the second article in a two-part series on ThousandEyes Transaction Tests. The first article covers how to fix transaction test timeouts by splitting long scripts into parallel tests: How to Fix Transaction Test Timeouts by Running Tests in Parallel. This article investigates a related phenomenon we discovered while building that solution: why markers sometimes appear out of script order in Average Metrics, and how to predict the exact display order using the ThousandEyes MCP Server.
While investigating the transaction timeout problem described in How to Fix Transaction Test Timeouts by Running Tests in Parallel, we noticed something unexpected. After splitting a 10-marker script into two parallel 5-marker tests against a SuiteCRM demo environment, we opened the Average Metrics view on a round where some agents had timed out — and the markers were not in script order.
Instead of seeing:
1: Login 2: Accounts 3: Contacts 4: Leads 5: Opportunities
We were seeing orderings like:
1: Login 2: Accounts 4: Leads 3: Contacts 5: Opportunities
Or even more striking, in some rounds:
2: Cases 1: Login 3: Campaigns 4: Calls 5: Meetings 6: Documents
Login — the first marker in the script — appearing in second position. This isn't a display bug. It's a deterministic behavior with a clear algorithm behind it. This article documents what we found.
We used two parallel web transaction tests against demo.suiteondemand.com, a publicly accessible SuiteCRM instance hosted in the UK. SuiteCRM is a free and open-source CRM — you can recreate these tests yourself without any licensing cost:
Both tests ran on 9 agents: New York, Hamburg, Cambridge, Barcelona, Los Angeles, Mexico City, Buenos Aires, Glasgow, and Manchester — with a 2-minute interval.
One agent — Buenos Aires — consistently experienced connectivity issues ranging from ECONNREFUSED to full transaction timeouts. Mexico City also experienced periodic timeouts. This created a natural laboratory of error conditions across hundreds of rounds.
The natural assumption is that markers always appear in script order. And that assumption is correct — most of the time. When all agents complete all markers successfully, ThousandEyes displays them in the exact order they were defined in the script.
The unexpected behavior only emerges when some agents fail partway through a transaction. In those rounds, markers shift positions in ways that initially seem random, but turn out to be completely predictable once you understand the underlying rule.
This analysis uses a different set of rounds than the timeout investigation in Part 1. The timeout event described there occurred at 18:00 CST on March 18, 2026. The marker ordering examples in this article come from rounds throughout the rest of the evening, where the pattern of partial failures was more varied and visible.
The core algorithm and all primary examples in this article are validated exclusively against WEB_TRANSACTION_TIMEOUT errors. Timeouts produce the cleanest failure pattern: the agent runs the script up to a hard time limit, so the null boundary always falls cleanly between two markers. This makes the n values fully readable from the MCP data without requiring per-agent detail queries. Other error types — WEB_TRANSACTION_OTHER_ERROR, WEB_TRANSACTION_ASSERT_ERROR — follow the same underlying sort mechanism, but introduce edge cases covered separately at the end of this article.
Every marker in Average Metrics shows two things: an average time and a bar representing that average relative to the others. What isn't shown explicitly — but is critical — is the denominator used to calculate each average.
ThousandEyes uses a null-excluding average. When an agent fails partway through a transaction, it doesn't contribute a value of zero to the markers it never reached — it contributes nothing at all. Those markers are excluded from both the numerator and the denominator.
This means that if 9 agents run a test and 2 time out after completing Login but before completing Accounts, the averages look like this:
| Marker | Agents with values | Average |
|---|---|---|
| 1: Login | 9 | includes the 2 timeout agents' Login times |
| 2: Accounts | 7 | only the 7 agents that completed it |
| 3: Contacts | 7 | same |
This null exclusion is why Login appears inflated when agents time out. An agent that spends 28 seconds trying to load the dashboard before timing out contributes its full Login time to the average, while contributing nothing to any subsequent marker.
Where to find n in the UI: ThousandEyes doesn't show the denominator directly in the Average Metrics bar chart — but it's one click away. Switch to the Table tab (next to Map and Waterfall). You'll see a Markers column for each agent.
One important nuance: the Markers column shows the number of markers the agent reached and entered — not strictly the number it completed cleanly before the timeout fired. In the table above, Buenos Aires shows Markers: 5 even though it timed out and only took 4 Screenshots. This is because the timeout occurred during the execution of the 5th marker, after the agent had already started it. ThousandEyes counts that marker as reached, and the ordering algorithm uses this number as n.
This is the same mechanic behind Case 4 in this article: when an agent fails inside a marker rather than between markers, the UI and the algorithm disagree on what n to use — the Table tab shows the marker as reached, but the ordering treats it as not completed. For the clean timeout cases that are the focus of this article, the Markers column is a reliable proxy for n: an agent with Markers: 3 in a 5-marker test contributed values to markers 1–3 and null to markers 4–5.
Agents with fewer Markers than the script total are your timeout agents — and the Errors column confirms it in red (e.g., 1 Timeout).
Figure 1: Markers and Timeouts
The MCP metric WEB_TRANSACTION_MARKER_TIME_DECOMPOSED with group_by=ALL returns exactly the same averages shown in the bar chart, with the same denominators — useful when you want to read n programmatically rather than by counting rows in the Table tab.
After analyzing hundreds of rounds across both tests — focusing specifically on WEB_TRANSACTION_TIMEOUT errors, which produce the cleanest and most predictable failure pattern — the ordering rule is:
ThousandEyes orders markers in Average Metrics by descending n (denominator), and within markers that share the same n, by sequential script order.
That's the complete rule. Two levels of sorting:
Why timeouts produce the cleanest case: when an agent times out, it has been executing the script up to a specific point. Every marker it completed has a real value; every marker it hadn't reached yet is null. The null boundary is always a clean cut between two markers — never inside one. This means the n values you see in the MCP data directly and unambiguously reflect the display order. No per-agent detail queries needed.
Algorithmically, this is a two-key stable sort. In JavaScript — the language ThousandEyes transaction scripts run in — it maps directly to:
markers.sort((a, b) => b.n - a.n || a.scriptPos - b.scriptPos);
The || operator is the tiebreaker: if two markers have the same n (difference = 0, which is falsy), the sort falls through to script position. You can run this in your browser console with the exact values from Case 3 above and reproduce the display order ThousandEyes shows.
The stability guarantee matters here: Array.prototype.sort() has been required to be stable since ES2019 — meaning elements that compare as equal retain their original relative order. Before that, Chrome's V8 engine used an unstable QuickSort for arrays longer than 10 elements. In 2019, V8 switched to TimSort (a hybrid merge/insertion sort originally developed for CPython), which guarantees stability. The within-group script ordering you see in ThousandEyes is deterministic precisely because of this guarantee.
The full operation is also a textbook Schwartzian transform — a decorate-sort-undecorate pattern common in functional languages: attach (n, scriptPos) to each marker, sort by that composite key, then display only the marker names. This is efficient because the sort key is computed once per marker rather than re-evaluated on every comparison.
This rule applies to any web transaction test with multiple agents where at least one agent times out partway through a transaction. Other error types follow the same underlying mechanism but introduce complications covered in the edge cases section below.
Cases 1, 2, and 3 are drawn directly from the MCP data pulled for this article and verified in the UI. Case 4 is also MCP-confirmed, from a different round. It's treated separately because it involves a different error type (WEB_TRANSACTION_OTHER_ERROR) and produces the most counterintuitive output, illustrating a distinct mechanism that comes up beyond clean timeouts.
Case 1: One agent times out mid-script
Test: Part 1 · Round: March 18, 2026 at 21:02 CST
Buenos Aires times out after completing Login and Accounts, but before Contacts. The n values become:
| Marker | n |
|---|---|
| 1: Login | 9 |
| 2: Accounts | 9 |
| 3: Contacts | 8 |
| 4: Leads | 8 |
| 5: Opportunities | 8 |
Group 1 (n=9): Login, Accounts → script order: 1, 2
Group 2 (n=8): Contacts, Leads, Opportunities → script order: 3, 4, 5
Result: 1: Login > 2: Accounts > 3: Contacts > 4: Leads > 5: Opportunities
The order looks sequential — because within each n group, script order applies. But the underlying mechanism is n-based grouping, not a simple "show in script order always" rule.
Case 2: One agent times out after Login only
Test (navigate to round timestamp): Part 1 · Round: March 18, 2026 at 20:36 CST
| Marker | n |
|---|---|
| 1: Login | 9 |
| 2: Accounts | 8 |
| 3: Contacts | 8 |
| 4: Leads | 8 |
| 5: Opportunities | 8 |
Result: 1: Login > 2: Accounts > 3: Contacts > 4: Leads > 5: Opportunities
Again looks sequential. Login's average is inflated because Buenos Aires' slow Login time is included in the n=9 calculation while contributing nothing to the markers it never completed.
Case 3: Multiple agents fail at different markers (the interesting case)
Test: Part 2 · Round: March 18, 2026 at 23:00 CST
| Marker | n |
|---|---|
| 1: Login | 9 |
| 2: Cases | 8 |
| 3: Campaigns | 7 |
| 4: Calls | 7 |
| 5: Meetings | 7 |
| 6: Documents | 6 |
Group 1 (n=9): Login → 1
Group 2 (n=8): Cases → 2
Group 3 (n=7): Campaigns, Calls, Meetings → script order: 3, 4, 5
Group 4 (n=6): Documents → 6
Predicted order: 1: Login > 2: Cases > 3: Campaigns > 4: Calls > 5: Meetings > 6: Documents — verified in the UI
The UI confirmed: Login=11,959ms, Cases=4,593ms, Campaigns=2,607ms, Calls=4,135ms, Meetings=3,692ms, Documents=2,004ms — four visually distinct bar lengths matching the four n groups exactly.
Case 4: An agent fails inside the Login marker itself
Test (navigate to round timestamp): Part 2 · Round: March 18, 2026 at 22:02 CST · Error type: WEB_TRANSACTION_OTHER_ERROR
Los Angeles, Barcelona, and Mexico City all fail inside the Login marker — the page loaded but the login form never rendered, so Login was started but never completed for those three agents. Buenos Aires completes Login (22,301ms) but then fails before Cases. The aggregate n values are:
| Marker | n |
|---|---|
| 1: Login | 6 (3 agents completed Login; MCP group_by=ALL reports 9 — see note) |
| 2: Cases | 6 |
| 3: Campaigns | 5 |
| 4: Calls | 5 |
| 5: Meetings | 5 |
| 6: Documents | 4 |
Group 1 (n=6): Cases → 1
Group 2 (n=6): Login → 2
Group 3 (n=5): Campaigns, Calls, Meetings → script order: 3, 4, 5
Group 4 (n=4): Documents → 6
Result: 2: Cases > 1: Login > 3: Campaigns > 4: Calls > 5: Meetings > 6: Documents
Login at 7,932ms appeared in second position. Cases at 3,240ms appeared first.
Note on the n discrepancy: The MCP group_by=ALL query reports Login n=9 for this round — it counts agents that initiated the marker, not agents that completed it. The group_by=SOURCE_AGENT query reveals the null Login values for LA, Barcelona, and Mexico City, showing the true completion count. This is the one scenario where the aggregate n is misleading for predicting display order.
You don't need the MCP Server to apply this methodology. The Average Metrics view in the ThousandEyes UI gives you everything you need — if you know what to look for.
Step 1 — Open Average Metrics and look at the marker list
Navigate to the test's Average Metrics view and select the round you want to inspect. The markers appear as a horizontal bar chart. Read the list from top to bottom — that order is the answer.
Step 2 — Count the groups
Markers that share the same n appear in script order within a group. When you see a jump in bar length between consecutive markers, that's a group boundary — the n changed, meaning one or more agents failed between those two markers.
Step 3 — Locate where agents failed
The first marker with a shorter bar than the one above it is the failure boundary. The longer bars above represent the complete group (n = total agents). The shorter bars below represent the incomplete group (n = total minus failing agents).
Step 4 — Check Login's position
Login is normally first because every agent that runs any part of the transaction must pass through it. If Login is not first, at least one agent failed before completing Login itself — not just before completing later markers. This is the signal from Case 4: it points to a WEB_TRANSACTION_OTHER_ERROR or WEB_TRANSACTION_ASSERT_ERROR mid-marker, not a clean timeout. Open the Waterfall view and look for a null Login value on the failing agent.
The UI-only process:
Case 1 — Buenos Aires times out after Accounts
Login and Accounts share n=9. The break between Accounts and Contacts is where Buenos Aires dropped. Login is first — BA completed Login and Accounts before failing.
Case 2 — Buenos Aires times out right after Login
Login is alone at the top. BA completed Login but nothing else. The inflation is visible: Login at 4,406ms is much higher than the others — that's BA's slow Login value pulling the average up.
Case 3 — Multiple agents fail at different markers
Four distinct bar lengths = four distinct failure points. Between Login and Cases, 3 agents dropped. Between Cases and Campaigns, 1 more. Between Meetings and Documents, 1 more. The failure timeline is readable directly from the bar lengths.
Case 4 — An agent fails inside the Login marker itself
Login is not first. That single observation tells you immediately that something unusual happened: an agent failed inside Login, not after it. Cases completed by more agents than Login completed. This is the signal to open the Waterfall view and check which agent shows a null Login value.
The three-second diagnostic:
WEB_TRANSACTION_OTHER_ERROR or assert error occurred inside Login (Case 4). Check the Waterfall for null Login values. This will not happen with a clean timeout.No MCP required. No scripts. Just the bar chart.
Figure 2: Markers divided by groups
The UI approach above works well for manual triage. If you want to automate the analysis — generating predictions before opening the UI, or building tooling that explains marker order at scale — the ThousandEyes MCP Server provides the raw data to do it.
Three metric calls are sufficient to predict the marker display order for any round:
1. WEB_TRANSACTION_TIMEOUT with group_by=SOURCE_AGENT
Identifies which agents timed out and in which test. A timed-out agent completed Login before failing — its Login time is included in the average and inflates it.
2. WEB_TRANSACTION_OTHER_ERROR with group_by=SOURCE_AGENT
Identifies ECONNREFUSED and other errors. Most occur after Login completes. The exception is errors inside the Login marker itself — which reduce Login's effective n.
3. WEB_TRANSACTION_MARKER_TIME_DECOMPOSED with group_by=ALL
Returns the exact averages and n values shown in the UI. Sort markers by n descending, break ties by script position, and you have the display order.
The prediction algorithm:
1. Get marker averages with group_by=ALL → get n per marker 2. Group markers by n value 3. Within each group, sort by script position (ascending) 4. Concatenate groups from highest n to lowest n → Result = display order
Validated across dozens of rounds with up to 4 distinct n levels, multiple simultaneous agent failures, and different error types — with 100% match against the UI in every case except the rare "failure inside Login marker" edge case.
The SuiteCRM demo environment above is ideal for illustrating the rule clearly — Buenos Aires and Mexico City provided consistent, reproducible failures across hundreds of rounds, and we had full MCP access to verify every prediction programmatically. But the methodology works identically on any ThousandEyes account, including accounts where you have only UI access and no MCP integration.
The three examples below come from production environments across different industries. In each case, the analysis was done exclusively through the ThousandEyes UI — no API calls, no external tooling. Customer names, application names, and identifying details have been omitted.
Example 1: Financial Services — 8-marker SSO flow (2 Cloud Agents)
A multi-step SSO and account management workflow monitored from two geographically distributed Cloud Agents. The script defines 8 sequential markers covering authentication, navigation, and data retrieval steps.
In one round, the Average Metrics view showed an unusual pattern: the first three markers had noticeably longer bars than the last five. The lengths were consistent within each group — suggesting a denominator difference, not a performance difference.
Reading the bar chart directly: Markers 1, 2, and 3 had identical bar lengths (n=2 — both agents completed them). Markers 4 through 8 had shorter, identical bars (n=1 — only one agent completed them). The break was between marker 3 and marker 4.
Opening the Waterfall view for the round confirmed it: one agent showed "Incomplete" starting at marker 4, with real values for markers 1–3. That agent timed out between markers 3 and 4.
Display order in the UI: 1 → 2 → 3 → 4 → 5 → 6 → 7 → 8
Looks sequential — but it's produced by the n-grouping. If the timeout had happened one marker earlier, the split would be visible at a different point.
Rule confirmed: Two distinct bar lengths, clean break at marker 4, script order within each group.
Example 2: Technology Sector — 10-marker iframe-heavy dashboard (3 Cloud Agents)
A complex transaction test that loads a multi-panel analytics dashboard with nested iframes. The script has 10 markers, one of which (max_wait_exceeded) is conditional — it only fires when a wait threshold is exceeded.
In one round, the Average Metrics view showed something unusual: nine markers at one bar length, and one marker (max_wait_exceeded) significantly shorter — appearing last in the list, below markers defined after it in the script.
Reading the bar chart directly: Nine markers had identical long bars (n=3, all agents completed them). The max_wait_exceeded marker had a much shorter bar and appeared last — its n=1 pushed it to the bottom regardless of its script position. The conditional marker had only fired for one of the three agents in this round.
Display order in the UI: standard markers 1–9 in script order, max_wait_exceeded last.
This behavior is actually useful: a conditional marker that indicates a problem rises to prominence in the display as more agents trigger it, and recedes naturally when conditions normalize. The display is self-organizing.
Rule confirmed: Two distinct bar lengths, conditional marker correctly sorted to bottom, no configuration needed.
Example 3: Enterprise SaaS — 11-marker customer workflow (3 Cloud Agents)
A 3-agent test monitoring a multi-step end-user workflow with 11 sequential markers covering login, search, selection, and confirmation steps.
In one round, the Average Metrics view showed what appeared to be a normal sequential display — all 11 markers in order, 1 through 11. A quick glance might dismiss this as a clean round. But the bar lengths told a different story.
Reading the bar chart directly: Markers 1 through 7 had longer, identical bars. Markers 8 through 11 had shorter, identical bars. The break was at marker 8. Two distinct bar lengths = one failure event. One agent completed markers 1–7 and stopped.
Opening the Waterfall view confirmed it: one agent showed "Incomplete" for markers 8–11 with a WEB_TRANSACTION_TIMEOUT error.
Display order in the UI: 1 → 2 → 3 → 4 → 5 → 6 → 7 → 8 → 9 → 10 → 11
Sequential — but the bar length split is the diagnostic signal. Without knowing to look for it, this round could appear clean. With the methodology, the failure is visible in two seconds.
Rule confirmed: Two distinct bar lengths, break at marker 8, failure boundary correctly identified from bar lengths alone.
Summary across all configurations:
| Configuration | Method | Agents | Markers | n groups | Rule confirmed |
|---|---|---|---|---|---|
| SuiteCRM demo (Part 1) | MCP + UI | 9 Cloud | 5 | up to 4 | Yes |
| SuiteCRM demo (Part 2) | MCP + UI | 9 Cloud | 6 | up to 4 | Yes |
| Financial services SSO | UI only | 2 Cloud | 8 | 2 | Yes |
| Technology dashboard | UI only | 3 Cloud | 10 | 2 (+ conditional) | Yes |
| Enterprise SaaS workflow | UI only | 3 Cloud | 11 | 2 | Yes |
The methodology held across every configuration — whether confirmed programmatically via MCP or read directly from the bar chart in the UI. The underlying rule is the same either way. The MCP gives you exact n values and the ability to predict before opening the UI. The UI alone gives you the diagnosis in real time, from the bar lengths, without any additional tooling.
The validation above used exclusively WEB_TRANSACTION_TIMEOUT errors, which produce clean, predictable behavior. Real environments produce other error types that introduce additional complexity. Case 4 from the previous section is the clearest example of what "additional complexity" looks like in practice — it's referenced throughout this section as a concrete illustration.
WEB_TRANSACTION_TIMEOUT — Clean and predictable
The simplest case. Every marker an agent completed has a real value; every marker not yet reached is null. The null boundary is always a clean cut, making n straightforward to reason about from the outside. Cases 1, 2, and 3 in this article are all timeout cases — the MCP data directly and reliably reflects what the UI shows.
WEB_TRANSACTION_OTHER_ERROR — Usually clean, with one exception
Covers connection refusals and network errors. Most happen early, contributing null to all markers. The exception — illustrated by Case 4 — is when the error occurs inside a marker already started: that specific marker also becomes null for the failing agent, even though earlier markers completed normally. In Case 4, Los Angeles loaded the page (starting Login) but the login form never rendered (no completion value). Login received null from LA while Cases received a real value — which is why Cases appeared before Login.
This cannot happen with a clean timeout, because timeouts always fire at step boundaries. WEB_TRANSACTION_OTHER_ERROR can fire mid-marker. If you see an earlier-script marker appearing below a later-script marker and this error type is present, a mid-marker failure is the likely cause. Use WEB_TRANSACTION_MARKER_TIME_DECOMPOSED with group_by=SOURCE_AGENT to identify which agent has the null value.
WEB_TRANSACTION_ASSERT_ERROR — More complex
Assert errors can fire at any point within a marker's execution window — the same mid-marker mechanism as Case 4, but applicable at any marker position, not just Login. When this happens, the marker in progress is null for the failing agent, which may reduce its effective n and shift its display position.
Detection: Use WEB_TRANSACTION_MARKER_TIME_DECOMPOSED with group_by=SOURCE_AGENT. If the failing agent shows null for a marker that appears out of expected position, a mid-marker assert is the cause.
Why the demo environment was ideal for clean validation
Buenos Aires consistently produced WEB_TRANSACTION_TIMEOUT failures with a clean marker boundary. Case 4 came from a WEB_TRANSACTION_OTHER_ERROR on Los Angeles in a separate round — a different error type, different agent, different mechanism. It's included not as part of the core algorithm validation, but as a demonstration that the display order is always consistent with whatever effective n ThousandEyes calculates, regardless of which error type produced the partial completion.
Single Enterprise Agent deployments: the rule doesn't apply
With only one agent, n is always either 1 (marker completed) or null (not reached). There's no cross-agent comparison to produce a reordering. The ordering behavior is only observable — and only matters — with two or more agents where at least one fails partway through in a given round.
Understanding this algorithm has concrete implications for how you read and act on Transaction Test data.
When markers appear out of script order, it is always a sign that at least one agent failed to complete some markers. The reordering is not cosmetic — it is diagnostic. The marker that appears last has the fewest agents with completed values. The markers at the top have the most. You can immediately see which parts of the transaction were most affected.
Two distortions happen simultaneously when agents time out:
First, the averages of incomplete markers are calculated only from the agents that completed them — typically the faster agents. This makes the averages look artificially low for those markers. Second, Login's average is inflated because it includes the timeout agents' Login times, which are longer than normal. These two effects compound: Login looks very slow, later markers look deceptively fast. Worth knowing when setting SLA thresholds or alert rules.
The position of Login is a quick diagnostic signal. In normal operation, Login appears first because it has the highest n — every agent that runs any part of the transaction passes through Login. If Login appears anywhere other than first, at least one agent failed before completing the Login marker itself. That narrows the problem immediately: it's not a slow application — it's an agent that can't reach the login form at all.
This behavior is not documented in the ThousandEyes documentation. The null-excluding average calculation is implied by how transaction markers work, but the sorting algorithm — n descending, with script order as the tiebreaker — is not stated anywhere in the product documentation or knowledge base as of this writing.
We derived this rule empirically, by comparing MCP-calculated predictions against the UI across dozens of rounds, iterating until the rule explained 100% of observed cases (with the exception of the "failure inside Login marker" edge case, which requires per-agent marker detail data to predict with certainty).
If you have encountered this behavior and assumed it was a display anomaly, it isn't. It's a feature. And now you can read it.
The analysis in this article uses the following test and snapshot data. Each link opens directly to the round referenced in the article:
Case 4: Multi-agent failure round, Cases-before-Login ordering (Part 2, 22:02 CST)
The marker display order in ThousandEyes Average Metrics is determined by two rules applied in sequence: markers with higher completion counts (n) appear first, and markers with identical completion counts appear in script order.
This makes the display order diagnostic rather than purely cosmetic — it tells you exactly how far through the transaction each agent made it. The MCP Server provides the data to reproduce this calculation independently, which means you can build tooling that explains not just what failed, but how far the transaction progressed before failing, for every round, for every agent.
The companion article, How to Fix Transaction Test Timeouts by Running Tests in Parallel, covers the timeout problem that led to this investigation and how splitting a long script into parallel tests resolved it.
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: