Pattern 06 / generator + critic

Evaluator-Optimiser Pattern: Generator and Critic in a Loop (2026)

A generator agent produces a candidate output. An evaluator agent scores it. The generator revises. The loop terminates when the criteria are met or iterations are exhausted.

Evaluator-optimiser topology. Two agents alternate: the generator produces a candidate output; the evaluator scores it against quality criteria; critique flows back to the generator. The result emits when the evaluator passes the candidate.Pattern documented in: Anthropic, “Building Effective Agents” (Schluntz, December 2024) anthropic.com/research/building-effective-agents. Accessed 30 April 2026.

The pattern (Anthropic, December 2024)

Anthropic’s essay names the evaluator-optimiser pattern as one of two top-tier multi-agent shapes (the other being orchestrator-workers). The structural property is a closed loop between two agents with different roles: one generates, one evaluates, and the generator’s next iteration is conditioned on the evaluator’s feedback.

The pattern presupposes that quality criteria can be made explicit. The evaluator is asked to score against a rubric (factuality, citation coverage, code-passes-tests, fluency, alignment with brief). Where the rubric is not explicit, the evaluator becomes a vibes machine, and the loop produces unpredictable results.

Anthropic’s framing treats the evaluator-optimiser as a workhorse for tasks where iteration genuinely improves the output. Drafting tasks, code generation, and content-against-brief tasks are paradigmatic. Tasks where the first attempt is usually correct and the iteration cost is high (long-form synthesis, expensive tool calls per attempt) are not good candidates.

Related patterns

Self-refine (Madaan et al., 2023, arxiv 2303.17651). The single-agent variant, where one model alternates between generate and critique on its own. Reference: arxiv.org/abs/2303.17651. Access date: 30 April 2026.

Reflexion (Shinn et al., 2023, arxiv 2303.11366). A single-agent variant that adds episodic verbal-feedback memory across attempts, framed as “verbal reinforcement learning”. Reference: arxiv.org/abs/2303.11366. Access date: 30 April 2026.

The two-agent evaluator-optimiser is the multi-agent generalisation of self-refine: the critic role is given to a separate agent (often a separate model, often with a different system prompt) so that the critique is genuinely external to the generator.

When the pattern is the right shape

When quality criteria are well-defined. Tests passing, citation coverage, fluency thresholds, structural-format checks all qualify. Vibes do not.

When iterative improvement converges within a bounded number of rounds. If the loop runs more than three to five iterations on average, the cost-per-output usually exceeds the marginal quality gain.

When the evaluator’s judgement is cheaper than the generator’s regeneration. If the evaluator can score a candidate quickly and the generator’s next attempt is also fast, the loop is economical. If either side is expensive, the pattern can blow out costs.

Common failure modes

Infinite looping. No convergence, no termination. The remedy is a hard max-iterations cap and a quality threshold that the evaluator commits to.

Sycophantic evaluator. The evaluator approves nearly everything, especially if its prompt is vague. The remedy is an explicit rubric, a strict scoring schema, and (where possible) a few held-out examples to calibrate the evaluator.

Cost explosion. Each iteration spends tokens on both generator and evaluator. Budget the total token spend at design time, not after a runaway run.

Reference example

Self-refine single-agent loop (Madaan et al., 2023). A single agent generates, critiques, and refines using the same model in alternating turns. The structural difference from evaluator-optimiser is that there is no separate critic agent; the same agent performs both roles.Source: Madaan et al., “Self-Refine: Iterative Refinement with Self-Feedback” (March 2023) arxiv.org/abs/2303.17651. Accessed 30 April 2026.

Orchestrator-workers (the dynamic-spawn variant)

Anthropic’s December 2024 paper distinguishes the orchestrator-workers pattern (where the orchestrator dynamically decomposes the goal into sub-tasks at runtime and spawns worker agents to execute them) from the simpler supervisor pattern (where the workers are fixed at design time). The structural difference is whether worker scope is determined statically or dynamically.

Orchestrator-workers (dynamic-spawn) topology. The orchestrator decomposes a goal at runtime, spawns workers shaped to each sub-task, and synthesises their outputs.Source: Anthropic, “Building Effective Agents” (Schluntz, December 2024) anthropic.com/research/building-effective-agents. Accessed 30 April 2026.

Related on this site

Supervisor pattern: the related orchestrator-workers shape with fixed workers.
Human-in-the-loop: where the evaluator is human rather than another agent.
Single-agent topology: the precursor to self-refine.

For the engineering reference, see buildingeffectiveagents.com.