Skip to main content

Deadlock & Infinite-Loop Prevention in Multi-Agent Sales

· 24 min read
Vadim Nicolai
Senior Software Engineer

How to Prevent Deadlocks and Infinite Loops in Multi-Agent Sales Workflows

Deadlock and infinite-loop prevention in multi-agent sales workflows starts with one ugly trace: a sales agent sits idle while a competitor closes the deal. Two nodes trade the same lead back and forth — rechecking CRM fields, re-requesting approval, re-updating scores — until the opportunity ages out. No cancellation, no escalation, no crash. Just an infinite loop that burns credits, writes no value, and slips past every per-message quality gate, because each individual draft looks fine.

This is article #8 of The Autonomous Sales Fleet — one production LangGraph + DeepSeek + Cloudflare-D1 + LangSmith system where each article realizes one 2026 reliability paper as one real graph node. The constraints stay constant across the series. A three-plane architecture splits the work: a LangGraph control plane, a Cloudflare data plane, and a LangSmith observability plane. DeepSeek-only egress runs through a single AI Gateway. A 0.80 eval gate sits on every prompt path. Grounding-First provenance tags every persisted decision, and every send waits on draft-first human approval. This piece adds the liveness layer: structural deadlock and infinite-loop prevention that runs before any model judges anything.

This is a guardrail, not a rung on the autonomy ladder. It is one of the constraints that earns the autonomy the higher rungs exercise — the CRM orchestrator, the coach→worker teams, the lead-to-proposal pipeline. Every plan→act→verify loop that runs unattended needs a deterministic floor under it. That floor proves the loop will actually terminate; without it, the act step has no safe upper bound. This guard is the thing that lets the fleet trust a self-directed loop at all.

Why multi-agent sales workflows deadlock: structural, not stylistic, causes

A per-message judge reads one draft at a time and asks "is this good?" It cannot see that the same good draft has now been produced four times, or that two nodes are each waiting on a state the other was supposed to write. Deadlock (every actor blocked, waiting on another) and livelock (actors keep acting but make no forward progress) are properties of the trajectory, not of any single output — so the defense has to live at the trajectory level too.

The anchor result for this article makes the stakes concrete. In TraceFix: Repairing Agent Coordination Protocols with TLA+ Counterexamples, Shuren Xia, Qiwei Li, Taqiya Ehsan, and Jorge Ortiz generate a multi-agent coordination protocol in PlusCal, iteratively repair it using TLA+/TLC model-checker counterexamples until it verifies, then deploy the verified logic as a runtime topology monitor that rejects out-of-protocol transitions. The paper reports that TLC-verified protocols cut deadlock/livelock from 31.1% down to 14.1% (arXiv:2605.07935). All 48 tasks reached full TLC verification, 62.5% of them on the first attempt. Average task completion reached 89.4%, rising to 81.5% full completion once the topology monitor was attached — every verification running in under 60 seconds. The 31.1% baseline is the number that should worry anyone shipping multi-agent sales workflows: roughly one run in three got stuck before any verification was applied.

Two ideas in that paper transfer directly to a production fleet without running a model checker on the hot path: (1) a known-good expected transition set per graph, and (2) a runtime monitor that rejects out-of-set transitions and detects no-progress cycles. We do not run TLA+ in production — a 60-second verification per step is a non-starter. Instead the expected ordered path of each graph is encoded as a Python allow-set and the observed trajectory is checked against it deterministically, with zero LLM calls. Same shape, microseconds instead of seconds.

The agent-reliability research behind deadlock and livelock detection

The fleet's design is grounded in four verifiable 2026/2025 papers, each contributing a distinct piece of the liveness picture. The numbered paragraphs below keep each claim tied to its source.

  1. TraceFix: Repairing Agent Coordination Protocols with TLA+ Counterexamples is the protagonist: it treats deadlock/livelock as a verifiable property of a coordination protocol and shows a 31.1% → 14.1% reduction once the protocol is TLC-verified (arXiv:2605.07935), with 62.5% first-attempt verification across all 48 tasks — meaning 37.5% needed at least one repair iteration — and 89.4% average task completion rising to 81.5% full completion under the deployed topology monitor, every verification finishing in under 60 seconds. The production takeaway is not "run TLA+" — it is the verify-then-monitor pattern: derive an expected-transition set, then enforce it at runtime. Our two guard nodes are exactly that monitor, minus the model checker, and the residual 14.1% that TraceFix still leaves stuck is precisely the gap a runtime hard-veto is there to catch.

  2. Agents of Chaos (arXiv:2602.20021), an exploratory red-teaming study of autonomous LLM agents run in a live laboratory with persistent memory, email, Discord, and shell access over a two-week window with twenty AI researchers probing six agents, documents looping as a first-class failure mode rather than a rare bug. Its case study on waste of resources shows agents readily spawning persistent background processes with no termination condition — converting short-lived tasks into unbounded ones — and entering self-reinforcing conversational loops driven by circular reasoning (arXiv:2602.20021). For a sales fleet this is the empirical justification for the two default thresholds the guard ships with: a node-revisit ceiling of 3 and an excessive-loop ceiling of 4, evaluated over a bounded window of 24 trajectory steps. Looping is not an edge case you can defer past the 31.1% baseline; it is what autonomous agents do when nothing structurally stops them, and a guard that trips once a node has been visited more than 3 times stops it in milliseconds rather than after a multi-second timeout.

  3. LLMDR: LLM-Driven Deadlock Detection and Resolution in Multi-Agent Pathfinding (arXiv:2503.00717) attacks the same problem in a different domain — multi-agent pathfinding — by integrating an LLM's inference with learnt MAPF models and prioritized planning to detect deadlocks and supply customized resolution strategies, reporting improved success rates specifically in deadlock-prone scenarios where learnt models stall (arXiv:2503.00717). The transferable insight is that deadlock detection and deadlock resolution are two separate concerns: detection can and should be cheap and deterministic — our guard catches a cycle once a node has been visited more than 3 times, within milliseconds, scanning at most 24 trajectory steps — while resolution (here, a human interrupt) is where judgment belongs, never on the hot path and never at the cost of an LLM call when a counter will do.

  4. Continuum: Efficient and Robust Multi-Turn LLM Agent Scheduling with KV Cache Time-to-Live (arXiv:2511.02230) shows the scheduler-level face of the same hazard: when space contention prevents scheduling new requests, a naive policy can wedge, so Continuum unpins requests to break the stall, improving average job-completion time by over while raising throughput across multiple model sizes on the SWE-Bench, BFCL, and OpenHands agent benchmarks (arXiv:2511.02230). It is a reminder that "deadlock" appears at every layer — protocol, trajectory, and scheduler — and that each layer needs its own cheap, explicit unwedging rule rather than a hope that timeouts will paper over the cycle.

  5. Formalizing the Safety, Security, and Functional Properties of Agentic AI Systems (arXiv:2510.14133) supplies the vocabulary the guard nodes implement. Allegrini, Shreekumar, and Celik formalize 16 properties for the host-agent model and 14 for the task-lifecycle model, categorized into liveness, safety, completeness, and fairness, and use them to enable detection of coordination edge cases and prevention of deadlocks (arXiv:2510.14133). Our loop_guard is precisely a liveness property made runtime-cheap: a node-revisit cycle (count > 3) or a no-progress stall (≥ 2 consecutive identical steps) is a liveness violation that fails the run closed, while trajectory_anomaly's illegal-transition check is a safety property over the allowed transition set. Two of that paper's four property classes map directly onto two deterministic nodes — no temporal-logic engine required on the hot path.

Taken together these establish the design constraint: detection must be deterministic and cheap enough to run on every step, while resolution escalates to a human. That is the opposite of the common reflex — throwing an LLM judge at every anomaly — because a model judge is both expensive and, per the Agents of Chaos looping findings, itself susceptible to the circular reasoning it is meant to catch.

How to detect an infinite loop in real time: two deterministic guard nodes

The fleet's multi-level eval harness — internally "Agentic CLEAR" — lives in backend/graphs/agent_eval_graph.py. It scores a finished target-graph run on three levels (step / trajectory / outcome) and emits a composite verdict against the 0.80 threshold (CELL_PASS, the same bar the offline LangSmith golden datasets use). The deadlock and infinite-loop prevention work is two LLM-free deterministic nodes plus the existing veto channel. The graph runs the cheap structural guard first, so the expensive judges never even fire on a run that is already structurally broken.

Loading diagram…

loop_guard — the liveness floor (spec AA05, default ON)

loop_guard is a deterministic, LLM-free guard over state["trajectory"], wired plan_levels → loop_guard → step_eval so it runs before any judge. It enforces two fail-closed invariants:

  • node-revisit cycle — any node name recurring more than AGENT_EVAL_MAX_REVISITS (default 3, so the violation fires on the 4th visit via a strict count > 3 comparison) yields a hard node_revisit_cycle violation. A run that keeps re-entering the same node — say compose_touch → gate_draft → compose_touch because the draft keeps failing the gate — is making no topological progress. That is the infinite-loop catch.
  • no-progress stall — two or more consecutive steps with an identical node + summary pair yield a hard no_progress_stall violation. The agent is acting, but the state is not changing. That is the livelock catch.

The guard bounds its scan to _MAX_TRAJECTORY_STEPS = 24 steps. It logs counts and codes only — never step text — a PII discipline, since trajectory summaries can carry scraped or inbound message content. It also degrades safely: no trajectory, or AGENT_EVAL_LOOP_GUARD=0, is a no-op that preserves prior behavior. Because it is deterministic it also survives an LLM outage or the LLM_KILL_SWITCH — the liveness floor never depends on the model being up.

trajectory_anomaly — the TraceFix topology monitor, realized (spec AA26)

trajectory_anomaly is the runtime monitor from the anchor paper, encoded as a per-graph allow-set EXPECTED_PATHS[target_graph]. For the durable campaign engine the real ordered transitions form a legal cycle — one the back-edge to check_reply makes deliberate, not a bug. That legal loop is what the guard must distinguish from the illegal ones it catches:

Loading diagram…

The check is deterministic for the hard structural cases:

  • illegal_transition — a node→node hop not in the allow-set (hard violation).
  • excessive_loop — a node visited more than _DEFAULT_LOOP_CEILING = 4 times (hard).
  • dead_end — the run halted on a node whose allow-set excludes __end__ (soft).

Only the genuinely ambiguous case escalates to a model. The node first computes hard_flagged = any(...) over its anomalies, and if any hard structural flag already fired it skips the LLM call entirely — there is no point spending a judge token on a run a counter has already failed. A legal-but-unusual ordering with no hard flag is the only path that reaches a single DeepSeek judge call, and even that call reuses trajectory_eval's existing prompt rather than adding a second one, so PROMPT_VERSION (clear-v1-2026-06) is unchanged. The node respects LLM_KILL_SWITCH and is gated behind AGENT_EVAL_TRAJECTORY_ANOMALY (default OFF) for staged rollout.

The single-writer rule and the hard veto

A LangGraph subtlety drives the wiring: the framework silently drops node-returned keys that are not declared State channels. loop_guard and trajectory_anomaly therefore cannot write the shared violations channel directly — aggregate owns it. Each guard instead carries its findings through the declared, key-wise-merged graph_meta.telemetry channel, and aggregate (the single writer) re-collects them:

for node in ("loop_guard", "trajectory_anomaly", "detect_defects"):
violations += [v for v in (telemetry.get(node) or {}).get("violations") or []]

aggregate then partitions on severity — hard = [v for v in violations if v.get("severity") == "hard"] — and any hard violation forces passed=False before the composite mean is even compared to the 0.80 threshold. The deadlock guard wins over the score: a run can earn a perfect judge score on every level and still fail closed because it looped.

Timeout and retry strategies for agent hand-offs: why they fail alone

The standard reflex for a stuck agent hand-off is a timeout plus a retry, and on its own it makes things worse. A timeout is a reactive, time-based crutch. It waits seconds or minutes, then typically retries the exact operation that just failed. That converts a deadlock into a credit-burning livelock — precisely the Agents of Chaos failure pattern of unbounded retries with no termination condition. A retry without a structural cycle check is fuel, not a fix. loop_guard is proactive and state-based instead: it catches a node-revisit cycle once a node has been visited more than 3 times, within milliseconds, and instead of retrying it raises a hard violation that routes to a human. Detection is deterministic and free; resolution is a person — exactly the separation LLMDR draws between detecting a deadlock and resolving it. Use timeouts only as a backstop behind the structural guard, never as the primary defense.

Wiring into the draft-first approval path

The guard does more than fail an offline eval — it protects the live send path. verdict_clears_autoapprove(verdict) returns False if any violation is hard, so a looping campaign thread can never bypass the human interrupt under AGENT_EVAL_AUTOAPPROVE=1. The fleet is draft-first by constitution: the orchestrator composes and HOLDS. A deadlocked or stalled run is therefore held for a person, never auto-sent — the liveness guard and the approval gate reinforce each other rather than duplicating work.

The full eval topology, with the structural guard first and the veto last:

START → plan_levels → loop_guard → step_eval → trajectory_eval → outcome_gate
→ trajectory_anomaly → detect_defects → aggregate → escalate_borderline → END

loop_guard runs first because it is the cheapest and the most decisive. The LLM judges (step_eval, trajectory_eval, outcome_gate) run only after the structural liveness checks pass. trajectory_anomaly and the defect scan feed the same aggregate veto, and escalate_borderline sends only the genuinely close calls to a judge panel. Every persisted verdict carries Grounding-First provenance — confidence / reason / source / evidence — with loop findings recorded as counts and codes, never step bodies.

Where deadlock prevention sits in the multi-agent sales fleet

This liveness layer is the connective tissue between three sibling articles. Article #7, Hierarchical Coach-Worker Agent Teams, introduces the multi-node coordination that creates the deadlock surface in the first place — more agents, more handoffs, more circular-wait potential. Article #9, Evidence-Driven Release Gates for LLM Sales Agents, consumes the verdicts this guard produces: a window of runs with any hard liveness violation forces ROLLBACK rather than PROMOTE. And article #10, Agent Defect & Drift Detection in Production, shares the very same trajectory board — the detect_defects node sits one edge downstream of trajectory_anomaly, reading the same step list to catch tool-entropy collapse and role drift. Deadlock prevention, defect detection, and release gating are three reads of one trajectory. All three feed through one aggregate veto.

Circuit-breaker pattern and state-locking for multi-agent coordination

The hard veto is a circuit-breaker pattern for multi-agent coordination. A classic circuit breaker watches a failure signal and, past a threshold, opens to stop further attempts before they cascade. Here the failure signal is structural, not a 500 error. A node-revisit count above 3, or an illegal transition outside the allow-set, opens the breaker. The run fails closed and routes to a human, and no retry fires behind it.

State-locking and escalation logic follow the same separation. The single-writer rule is the lock: only aggregate may write the violations channel, so two guards can never race to half-write a verdict. The escalation logic is deterministic-first. A hard structural flag escalates straight to a person with zero model calls, and only a genuinely ambiguous, flag-free ordering ever reaches a single judge. Detection stays cheap; judgment stays human.

Deadlock and infinite-loop prevention FAQ

What causes a deadlock in a multi-agent sales system? A deadlock occurs when two or more agents wait for each other to release a resource or complete a hand-off, and none can proceed without the other acting first. In a sales fleet this looks like two nodes each blocked on a state the other was supposed to write.

How can I detect an infinite loop in an automated sales workflow? Track the trajectory, not just the latest draft. Use a node-revisit counter, a bounded step window, and a no-progress check that flags any consecutive step repeating the same node and summary. Trip a hard violation once a node recurs more than your configured limit — the fleet uses 3.

What is the circuit-breaker pattern in agent coordination? It monitors a failure signal across agent hand-offs and opens the circuit once a threshold is crossed, halting retries to prevent cascading failures and resource exhaustion. Here the breaker opens on a structural liveness violation rather than on an error rate.

Should I use timeouts or retries first for deadlock prevention? Neither, on its own. A retry without a structural cycle check is fuel for a livelock. Put a deterministic loop guard first, then keep a timeout only as a backstop behind it.

What we claim — and what we don't

The honest framing matters. TraceFix contributes the formal pattern: verify a protocol against an expected-transition set, then monitor it at runtime. The production contribution here is the deterministic realization of that monitor — two LLM-free guard nodes and a single-writer hard veto, with no model checker on the hot path. The numbers in this article are either on-disk constants (AGENT_EVAL_MAX_REVISITS = 3, _DEFAULT_LOOP_CEILING = 4, _MAX_TRAJECTORY_STEPS = 24, the 0.80 gate, PROMPT_VERSION = "clear-v1-2026-06") or figures from the cited paper landing pages (TraceFix's 31.1% → 14.1%, 62.5%, 89.4%, 81.5%; Continuum's 8×). No throughput lift, conversion number, or deployment anecdote is claimed that is not traceable to one of those two sources.

No method is foolproof. TraceFix still left 14.1% of runs stuck after full verification, and 37.5% of its tasks needed at least one repair iteration (arXiv:2605.07935). The production answer to that residual is not a better timeout — it is a deterministic, LLM-free guard that fails closed and wakes a human. As multi-agent sales systems move from stateless task-executors to proactive, goal-owning agents, deadlock and infinite loops become a dominant failure mode, invisible to any per-message judge. The fix is structural, and it has to be baked in: hit a guard, flag a hard violation, hold the draft, and escalate to a person — reliability by design, not by timeout.


The Autonomous Sales Fleet — full series

This is Part 8 of 10 in a series on building one production autonomous-agentic-sales system on LangGraph + DeepSeek + Cloudflare D1, where each part adds one capability that moves the fleet up the autonomy ladder — from human-triggered assistants to self-directed plan→act→verify loops, gated by autonomy guardrails. The arc runs orchestration → enablement & analytics → campaign strategy → reliability & evaluation.

Orchestration

  1. Autonomous CRM Orchestrator (reason→decompose→act→verify)autonomy: high
  2. Multi-Step Lead Qualificationhigh
  3. Lead-to-Proposal Multi-Agent Pipelinehigh
  4. Hierarchical Coach→Worker Delegationhigh

Enablement & analytics 4. Sales-Enablement Copilot: Deal Coaching & Objection Handlingmedium 5. NL-to-SQL CRM Analytics over Cloudflare D1medium

Campaign strategy 6. Design-Thinking Expert Panels for Campaign Strategymedium

Reliability & evaluation — the autonomy guardrails 8. Deadlock & Infinite-Loop Preventionguardrail 9. Evidence-Driven Release Gates (PROMOTE/HOLD/ROLLBACK)guardrail 10. Detecting Agent Defects & Drift in Productionguardrail

References