Towards self-improving error diagnosis in multi-agent systems

Large Language Model (LLM)-based Multi-Agent Systems (MAS) enable complex problem-solving but introduce significant debugging challenges, characterized by long interaction traces, inter-agent dependencies, and delayed error manifestation. Existing diagnostic approaches often rely on expensive expert annotation or 'LLM-as-a-judge' paradigms, which struggle to pinpoint decisive error steps within extended contexts. In this paper, we introduce ERRORPROBE, a self-improving framework for semantic failure attribution that identifies responsible agents and the originating error step. The framework operates via a three-stage pipeline: (1) operationalizing the MAS failure taxonomy to detect local anomalies, (2) performing symptom-driven backward tracing to prune irrelevant context, and (3) employing a specialized multi-agent team (Strategist, Investigator, Arbiter) to validate error hypotheses through tool-grounded execution. Crucially, ERRORPROBE maintains a verified episodic memory that updat

Read Original Article →

Source

https://www.amazon.science/publications/towards-self-improving-error-diagnosis-in-multi-agent-systems