Silent Failure Rate, measured: 3,013 monitored automation runs across Zapier, Make, n8n and Pipedream
Live benchmark · updated 2026-07-05 · raw data downloadable below · no affiliate links on this page
Across 3,013 monitored automation runs (July 1–05 2026), we recorded zero silent failures on n8n (0%, 95% CI 0–0.17%, n=2,315), Make (0%, 95% CI 0–6.0%, n=60) and Zapier (0%, 95% CI 0–9.9%, n=35) — but the moment Pipedream's free-tier quota ran out, its webhooks kept answering "success" while silently dropping 14 of 14 deliveries.
A silent failure is the automation failure that hurts most: the platform told you it worked, and it didn't. This page is a continuously running measurement of that rate — not a review, not an opinion round-up. Every number below comes from runs we fired ourselves, with both endpoints under our own control, reconciled one by one.
Scoreboard (webhook-triggered workflows)
| Platform (plan measured) | Output-expected runs | Silent failures | SFR | Latency p50 | p95 |
|---|---|---|---|---|---|
| n8n — self-hosted, bulk sampler | 2,315 | 0 | 0% (95% CI 0%–0.17%) | 724 ms | 3.31 s |
| Make — Make Plan (paid) | 60 | 0 | 0% (95% CI 0%–6.0%) | 1.01 s | 3.17 s |
| Zapier — Professional (paid) | 35 | 0 | 0% (95% CI 0%–9.9%) | 4.25 s | 8.91 s |
| Pipedream — free tier, before quota exhaustion | 76 | 0 | 0% (95% CI 0%–4.8%) | 2.72 s | 5.06 s |
Latency = fired-at → receipt-at-our-receiver, identical network path for every platform, so the numbers are comparable with each other (not with a vendor's internal benchmark). Median delivery: n8n 724 ms, Make 1.01 s, Zapier 4.25 s — a 6× spread on the same workload.
Beyond delivered/missed, we also check partial executions (a 2-action or 2-branch run that only half-completed: 0 observed), duplicates (0 observed) and filter leaks (runs a filter should have stopped but didn't: 0 observed). All zero so far.
The one real failure mode we caught: the quota wall
On July 2 at 14:24 UTC, our Pipedream free-tier credits ran out mid-measurement. From one event to the next — five seconds apart — delivery went from 100% to 0%. The part that matters: Pipedream's webhooks continued to return a success response for every event it then silently discarded (14 of 14 output-expected runs, 95% CI 78–100%). The sender gets no error, no queue, no replay — just an "ok" and a black hole. We re-verified the behaviour 19 hours later, past the documented daily reset time: still accepting, still dropping.
We do not count these runs in Pipedream's SFR — they are a billing-edge behaviour, not an engine failure (before the wall, Pipedream ran 76 runs without a single drop). But if you run production workloads on a metered free tier, this is the failure semantics you are signing up for: at the quota boundary, "accepted" stops meaning "delivered".
Scheduled (polling) workflows
A separate always-on workflow polls our data source every 30 minutes on each platform, and we track whether every scheduled tick actually happened and whether every new item was picked up.
| Platform | Scheduled polls observed | New items delivered |
|---|---|---|
| n8n | 218 | 50 / 50 |
| Make | 57 | 7 / 7 since activation* |
*One additional item changed before the Make scenario was first switched on and was superseded before its first poll — a setup artifact, not a platform miss; it is excluded above and flagged in the raw data. A cost observation previewed here (full cost benchmark coming): on Make, every empty poll of this workflow consumes 2 billable operations by design (poll + state lookup); on self-hosted n8n the same empty poll costs $0. Polling-heavy workloads pay a standing tax on per-operation platforms even when nothing happens.
Method — why these numbers are comparable
- Identical workflows everywhere. Four canonical flows (single action; filter + two actions; 30-minute poll; two parallel branches) rebuilt step-for-step on every platform — same trigger type, same step count, same HTTP calls.
- Both endpoints are ours. Events enter via each platform's webhook and exit as an HTTP POST to our own receiver. No third-party connectors — a Gmail outage can't masquerade as a platform failure.
- Every run is ID-tagged and reconciled. The controller writes a ledger entry before it fires; every receipt echoes the run ID; a reconciler classifies each run as delivered / missed / partial / duplicate / filtered. No sampling of logs — full census.
- Plans measured: Zapier Professional and Make's paid plan (both paid for by us — nobody gives us free accounts), n8n self-hosted on a 1 GB cloud VM, Pipedream free tier. Sampling rates differ by billing model: per-task platforms get smaller samples (hence wider intervals, honestly labeled); self-hosted n8n carries the bulk sample.
- Window: continuous since July 1, 2026. Editor-mode test runs are excluded; the full per-run ledger is downloadable below.
Limitations we know about: paid-platform samples are still small (their intervals say so); latency includes our receiver's network hop (identical for all platforms); results describe these specific plans in this specific window, and the meter keeps running — numbers tighten every week.
Reading 0% honestly
Every platform above currently shows zero silent failures — and those zeros are not equal. Zero in 35 runs still allows a true rate near 9.9%; zero in 2,315 runs pins it below 0.17%. That is why this page reports Wilson 95% confidence intervals and keeps accumulating: rare failures only become visible in large samples. If a vendor quotes you a reliability number without a sample size, they are quoting a feeling.
FAQ
What is a silent failure in workflow automation?
A run the platform accepted (its webhook returned success) but that never produced the expected output — and no error was ever surfaced to the user. The sender believes the work happened; it did not. BenchTruth measures this as the Silent Failure Rate (SFR): (missed + partial executions) ÷ all runs expected to produce output, with a Wilson 95% confidence interval.
Which automation platform is the most reliable in 2026?
In BenchTruth's measurements so far, no platform has silently dropped a run under normal operation: n8n 0% SFR in 2,315 runs (95% CI up to 0.17%), Make 0% in 60, Zapier 0% in 35. The measured reliability risk was not the platforms' engines but their billing edges: when Pipedream's free-tier quota ran out, it accepted and silently dropped 14 of 14 deliveries while still returning success.
Why do you publish confidence intervals instead of just a percentage?
Because 0 failures in 35 runs and 0 failures in 2,315 runs are very different statements. The Wilson 95% interval makes the difference explicit: after 35 clean runs the true rate could still be as high as ~10%; after 2,315 clean runs it is below 0.17%. Any reliability claim without a sample size and interval is marketing, not measurement.
Do runs stopped by a filter still cost money on Zapier?
No — measured, not assumed. Across our filtered runs on a Zapier Professional plan, runs halted by a Filter step consumed zero tasks; Zapier's own task meter matched our count of executed action steps exactly (45 = 45). The folklore that 'filtered Zaps still burn a task' did not hold in July 2026.
Raw data
Full per-run ledger (run ID, platform, workflow, fired-at, expected vs received, outcome, per-receipt latency): benchtruth-runs.csv · CC BY 4.0 · cite as "BenchTruth reliability dataset, benchtruth.com/reliability".