Can a Rubric Gate Stop an Agent From Taking the Wrong Action?

Inspired by Claude Outcomes, I tested a small outcome-gated retry loop on 30 support decisions. Wrong final actions dropped from 6 out of 30 to 2 out of 30, but the remaining failures showed why detection is not the same as repair. Baseline Path vs Gated Path with one retry. Source: Created by the author using OpenAI Images 2.0 Anthropic’s Claude Outcomes feature caught my attention because it formalizes a pattern many teams are already building by hand: a worker agent produces an answer, a separate grader checks the answer against a rubric, and the agent gets another pass when the output misses the mark. I liked the direction, but I wanted to test the smaller pattern underneath it without depending on the managed-agent product itself. I rebuilt the basic loop in a controlled setup: agent output, rubric judge, and one retry. I narrowed the experiment to one practical failure mode: Can a rubric-gated retry loop reduce wrong final actions when an agent has to make a structured decision? I focused on final actions because many production agent failures are not writing failures. The answer can sound reasonable, the explanation can be fluent, and the JSON can be valid, while the selected action is still wrong. In a support workflow, that becomes an operational mistake, not just a response-quality issue. An agent that says DENY when it should say NO_ACTION is still wrong, even if the reason sounds cautious. So I built a small experiment around that failure mode, using 30 synthetic support cases and a single model, gpt-5.4-mini. The sample is small, so I would not call this a benchmark. The useful part is the shape of the failures: what the gate fixed, what it missed, and what that tells you about where this pattern is worth using. What I built Each case gives the agent a short customer scenario, a policy, and a closed set of possible final actions: APPROVE DENY ASK_CLARIFYING_QUESTION ESCALATE NO_ACTION I compared two loops. The baseline sends the case straight to the agent and records whatever action comes back. The gated loop adds a rubric judge after the agent’s first answer. If the judge fails the answer, the agent gets one retry with the failed criteria. Before getting into the results, this is the pattern I would use as a quick reference for outcome-gated agent workflows. A quick guide for outcome-gated agent workflows, showing how rubric gates check proposed actions against policy contracts before accepting a final decision. Source: Created by the author using OpenAI Images 2.0 In this experiment, I kept the contract intentionally narrow: one closed action set, explicit policy constraints, and one retry after a failed rubric check. Baseline: case -> agent -> final action Gated: case -> agent -> rubric judge if failed -> retry once retry answer -> final action The cases were synthetic but designed around common production-style constraints: I used a real LLM provider rather than a deterministic simulator because the interesting part is how the model actually behaves when its first answer gets judged and revised. The important design choice The judge should not simply know the answer key. Each case has an expected_action, but that field is used only for offline metrics after the run. The judge never sees it. Instead, the judge receives the agent's selected action and checks whether that action violates objective policy constraints. For example, a case may say that a refund request already has an open chargeback in the payments workflow. The expected action may be NO_ACTION, because support should not decide the case while payments owns the resolution. A weak judge would check: “Expected action is NO_ACTION. Did the model choose NO_ACTION?” That is an answer-key check. It is useful for offline scoring, but it is not a rubric gate. The rubric gate asks something closer to: “If the selected action is DENY, does it incorrectly make support the decision owner while another workflow owns the case?” In production, you usually do not want the runtime judge to be a hidden answer key. You want it to check whether the agent’s proposed action violates a clear contract. In this experiment, the contract was simple: choose one final action, and do not choose an action that conflicts with the policy constraints in the case. The result In one run across 30 cases, the baseline agent made 6 wrong final-action decisions. After adding the rubric judge and one retry, wrong final actions dropped to 2. The improvement was meaningful, but the table is not the whole story. The gated loop did not make the agent generally smarter. It helped only when the judge could point to a specific mismatch between the selected action and the policy contract. That distinction became clearer in the failure examples. What the gate fixed The gated loop fixed 4 of the 6 baseline failures. Most of the fixes came from cases where the baseline agent was too quick to deny the request. The policy did not support immediate approval, but it also did not require denial. The better action was to ask for clarification or wait for another workflow. Here is one example. A requester asked support to change the account email. The request came from a new device, and the requester had not completed MFA. The requester did know the billing ZIP code. The policy said that account email changes require completed MFA or another approved ownership check before support can make the change. The baseline chose DENY. The expected action was ASK_CLARIFYING_QUESTION. The baseline answer was too final. The requester still had a possible path to complete an approved ownership check, so the right next step was to ask about that, not to close the door. The rubric judge failed the answer because denial was premature. After retry, the model changed the action to ASK_CLARIFYING_QUESTION. Another fixed case involved a replacement device. The customer asked support to resend a replacement, but an RMA had already been created that morning and was waiting for warehouse scan. The baseline chose DENY. The expected action was NO_ACTION. Denying the request misrepresented the workflow. The issue was not that the customer was ineligible. The issue was that the replacement process was already underway and support should not create a duplicate RMA. The retry corrected this to NO_ACTION. These examples show the value of the pattern at a practical level. The gate did not make the model smarter in a broad sense. It forced the model to repair a specific mismatch between its selected action and the policy contract. What still failed The two remaining failures were more interesting than the aggregate score. Both were existing workflow cases. In both, the expected action was NO_ACTION, but the final gated answer remained DENY even after retry. The first case involved a refund request with an open chargeback in the payments workflow. The policy said support must not issue refunds while a chargeback workflow is active, because the payment workflow owns the resolution. The expected action was NO_ACTION. The final action after gating was DENY. The second case involved a buyer asking for manual shipment release while a fraud review workflow was still open. The policy said support must not release shipments while fraud review is open. Again, the expected action was NO_ACTION. The final action after gating was DENY. In both cases, the judge identified the issue and returned failed criteria. The retry still kept the wrong action. This was the part that changed how I would think about the pattern in production. A rubric gate can detect that an action violates a rule, but detection is not the same as repair. If the model does not understand the operational distinction between “deny the request” and “take no action because another workflow owns the decision,” the retry can preserve the wrong action even after the judge flags it. A safe-sounding action is not always the correct action. DENY may sound conservative, but it is still wrong if the agent has no authority to decide the case. Why objective outcomes matter This pattern worked because the outcome was checkable. The agent had to choose from a closed action set, the policy constraints were explicit, and the judge could inspect the selected action and ask whether it violated a specific rule. For production systems, I would trust outcome-gated retries more in cases like these: I would be more cautious when the outcome is subjective: What I would change in production I would not use this pattern as a universal reliability layer. I would use it where the cost of the wrong action is high enough to justify an extra model call, and where the expected result can be expressed as a contract. The most important production change would be treating repeated non-repair cases as design feedback. If the agent keeps confusing DENY and NO_ACTION, that is a signal to improve the action definitions, add examples, or restructure the workflow contract. Another retry is unlikely to fix a conceptual gap in the model's understanding. Beyond that, I would separate runtime gating from offline scoring. The runtime judge checks constraints. The offline evaluator compares final actions against labeled expected outcomes. Mixing those two roles makes the system look stronger than it is. I would also log every retry with the original action, failed criteria, revised action, and final pass/fail status, because the pattern of repairs and non-repairs is often more useful than the final accuracy number. Closing Outcome gates are most useful when “done” can be written as a contract. In this experiment, that contract was simple: choose one final action that does not violate the policy constraints. Under those conditions, the gated loop reduced wrong final actions from 6 out of 30 to 2 out of 30. The remaining failures are the part I would pay most attention to in production. The judge detected the problem, but the retry still returned the wrong final action. That means the reliability question is not only whether a system can grade an output. It is whether the full loop can turn that feedback into the right decision. That is the part I would want evidence for before putting an outcome-gated loop into a real workflow. The repo for this experiment includes the synthetic cases, baseline runner, gated runner, judge logic, and generated result files from this run. References Anthropic Claude API Docs: Define outcomes Anthropic Cookbook: Outcomes, agents that verify their own work GitHub repo for this experiment Can a Rubric Gate Stop an Agent From Taking the Wrong Action? was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.

Read Original Article →

Source

https://pub.towardsai.net/can-a-rubric-gate-stop-an-agent-from-taking-the-wrong-action-982480285982?source=rss----98111c9905da---4