Agentic Workflows¶

Summary¶

Agentic workflows are composable architectural patterns for LLM-powered systems, ranging from simple sequential chains to fully autonomous agents. Anthropic's canonical taxonomy distinguishes workflows (predefined code paths) from agents (LLMs dynamically directing their own processes), and identifies five workflow patterns plus a general agent pattern. The guiding principle: start with the simplest solution; add complexity only when it demonstrably improves outcomes.

Details¶

Workflows vs. Agents¶

The term "agent" is overloaded. Anthropic's formal distinction:

Workflows — LLMs and tools orchestrated through predefined code paths; predictable, consistent, lower cost
Agents — LLMs that dynamically direct their own processes and tool usage; flexible, higher cost, higher error risk

Most real-world "agentic" systems are workflows. True agents are warranted only for open-ended problems where the required steps cannot be predicted in advance.

Start simple: for many applications, a single optimized LLM call with retrieval and in-context examples is sufficient. Agentic systems trade latency and cost for task performance — confirm the trade-off is worth it before building.

The Augmented LLM (Building Block)¶

The foundation of all patterns is an LLM enhanced with: - Retrieval — searching external knowledge stores - Tools — calling APIs, running code, writing files - Memory — storing and retrieving context across steps

Modern LLMs can actively use these — generating their own search queries, selecting tools, deciding what to retain. The Model Context Protocol (MCP) provides a standard interface for tool integration.

Pattern 1: Prompt Chaining¶

Decomposes a task into a sequential chain of LLM calls, where each call processes the output of the previous. Programmatic gates can validate intermediate outputs before proceeding.

Input → LLM₁ → [gate?] → LLM₂ → [gate?] → LLM₃ → Output

Use when: The task cleanly decomposes into fixed sequential subtasks; you want to trade latency for higher accuracy by making each call an easier sub-problem.

Examples: - Generate marketing copy → translate to another language - Write an outline → validate it meets criteria → write the full document

Pattern 2: Routing¶

Classifies an input and directs it to a specialized downstream LLM or prompt. Separates concerns; lets each sub-prompt be optimized for its specific input type without hurting other types.

Input → Classifier → Branch A / Branch B / Branch C

Use when: The task has distinct categories that are better handled separately; classification is reliable (either by LLM or a traditional classifier).

Examples: - Route customer queries (general / refund / technical) to different prompts and tools - Route easy questions to Claude Haiku (cheaper/faster) and hard questions to Claude Sonnet/Opus

Pattern 3: Parallelization¶

Multiple LLM calls run simultaneously; results are aggregated. Two variants:

Sectioning — Break a task into independent subtasks, run in parallel, combine outputs
Voting — Run the same task multiple times with diverse prompts; majority vote or threshold determines result

Use when: Subtasks can be parallelized for speed, or you need multiple independent perspectives for higher-confidence results.

Examples (sectioning): - Run a content-safety guardrail in parallel with the main response generation - Evaluate multiple aspects of model performance with separate LLM calls

Examples (voting): - Code security review: multiple prompts flag vulnerabilities independently - Content moderation: require multiple scorers to agree before blocking

Pattern 4: Orchestrator-Workers¶

A central orchestrator LLM dynamically breaks down a task, delegates subtasks to worker LLMs, and synthesizes their results. Unlike parallelization, the subtasks are not predefined — the orchestrator determines them based on the specific input.

Input → Orchestrator → Worker₁, Worker₂, Worker₃ (dynamic) → Synthesis

Use when: The required subtasks cannot be predicted in advance; the task structure depends on the input.

Examples: - Coding products: the orchestrator decides which files to edit and how, based on the specific task - Multi-source research: orchestrator determines which sources to consult; workers retrieve and analyze each

Pattern 5: Evaluator-Optimizer¶

One LLM generates a response; a second evaluates it and provides feedback; the generator revises in a loop.

Input → Generator → Draft → Evaluator → Feedback → Generator → ... → Final

Use when: (1) LLM responses demonstrably improve when a human articulates feedback — meaning an evaluator LLM can do the same; (2) Clear evaluation criteria exist; (3) Iterative refinement provides measurable value.

Examples: - Literary translation: the generator translates; the evaluator critiques nuance the generator missed - Complex research tasks: evaluator decides whether further searches are warranted

Fully Autonomous Agents¶

When workflows aren't enough, agents operate independently over many turns:

Begin with a command or interactive discussion to clarify the task
Plan and operate independently, using tools based on environmental feedback
Gain "ground truth" at each step (tool results, code execution) to assess progress
Pause at checkpoints or blockers for human input
Stop when complete or when stopping conditions are met (e.g., max iterations)

Use when: Open-ended problems where the number of steps cannot be predicted; model-driven decision-making at scale in trusted environments.

Risk: Higher cost, compounding errors. Requires extensive sandboxed testing and appropriate guardrails.

Validated domains (Anthropic): - Coding agents resolving SWE-bench tasks (GitHub issues) - Computer use: Claude operating a desktop to accomplish tasks

Combining Patterns¶

These patterns are composable, not prescriptive. A real system might route by query type, then use orchestrator-workers within a branch, with an evaluator-optimizer loop inside one worker. The key is measuring performance and adding complexity only when it demonstrably improves outcomes.

ACI: Agent-Computer Interface¶

Tool definitions are as important as overall prompts — invest in them accordingly. Anthropic's recommendations:

Think from the model's perspective — is it obvious how to use this tool from its description and parameters?
Use precise parameter names — treat tool definitions like docstrings for a junior developer
Avoid formatting overhead — e.g., don't require counting thousands of lines before writing a diff
Give the model tokens to think — don't put the model in a corner with constrained output formats
Poka-yoke your tools — design parameters that make mistakes hard (e.g., require absolute filepaths instead of relative)
Test in the workbench — run many example inputs; iterate on what the model gets wrong

The term ACI (agent-computer interface) parallels HCI (human-computer interface): both deserve equal engineering investment.

When to Use Frameworks¶

Frameworks (Claude Agent SDK, Strands Agents, Rivet, Vellum) simplify getting started but add abstraction layers that: - Obscure underlying prompts and responses (harder to debug) - Make it tempting to add complexity unnecessarily

Recommendation: start with direct LLM API calls. Most patterns are implementable in a few lines of code. Adopt a framework only if the low-level complexity becomes unmanageable — and ensure you understand what's underneath it.

Sub-Agents vs. Agent Teams¶

When building multi-agent systems, the first architectural question is not whether to use multiple agents, but what kind of coordination the task needs. Claude-style systems offer two distinct approaches:

Sub-Agents (Parallelism with Isolation)¶

A sub-agent is a specialized instance that runs in its own isolated context — a delegation primitive. It accepts a focused task and returns only the final output (no reasoning or intermediate steps). This makes sub-agents about compression, not just speed: they turn messy exploration into a clean signal.

Each sub-agent gets: a system prompt defining its role, a limited toolset, an isolated context, and a single well-scoped task.

Hard constraints: - Sub-agents cannot communicate with each other - Sub-agents cannot spawn child agents - All coordination flows through the parent orchestrator

The description field in the AgentDefinition acts as a routing signal — the parent uses it to decide which sub-agent handles each task:

from claude_agent_sdk import query, ClaudeAgentOptions, AgentDefinition

async for message in query(
    prompt="Review the authentication module for issues",
    options=ClaudeAgentOptions(
        allowed_tools=["Read", "Grep", "Glob", "Agent"],
        agents={
            "security-reviewer": AgentDefinition(
                description="Find vulnerabilities and security risks",
                prompt="You are a security expert.",
                tools=["Read", "Grep", "Glob"],
                model="sonnet",
            ),
            "performance-optimizer": AgentDefinition(
                description="Identify performance bottlenecks",
                prompt="You are a performance engineer.",
                tools=["Read", "Grep", "Glob"],
                model="sonnet",
            ),
        },
    ),
):
    print(message)

Agent Teams (Coordination Through Communication)¶

Agent teams are built for collaboration: agents that maintain context, communicate, and adapt in real time. Structure: a lead agent that assigns and synthesizes, teammates that execute, and a shared task layer tracking progress and dependencies. A frontend agent can signal backend changes and everything updates instantly.

Choosing Between Them¶

Dimension	Sub-Agents	Agent Teams
Isolation	Isolated context	Shared context
State	Stateless / one-shot	Persistent / interactive
Control	Parent-controlled	Peer-to-peer
Best for	Independent parallel tasks	Interdependent tasks

Rule: Use sub-agents when tasks are independent. Use teams when tasks depend on each other.

Context-Based Decomposition (The Right Mental Model)¶

Most system failures come from role-based decomposition (planner → implementer → tester), which creates context loss at every handoff: the implementer doesn't know what the planner knew; the tester doesn't know what the implementer decided.

The correct approach is context-based decomposition: ask "What information does this task actually need?" Keep two tasks in the same agent if they share deep context. Split only when context can be cleanly separated. This principle applies whether using sub-agents or teams.

[source: suryanshti-sub-agents-vs-agent-teams-2026.md]

Key Claims & Data Points¶

Workflow vs. agent distinction: workflows use predefined paths; agents dynamically direct their own — [source: building-effective-agents.md]
For SWE-bench, Anthropic spent more time optimizing tools (ACI) than the overall prompt — [source: building-effective-agents.md]
Agentic systems trade latency and cost for task performance; confirm the trade-off before building — [source: building-effective-agents.md]
Three core principles: simplicity, transparency, ACI — [source: building-effective-agents.md]
Routing can direct easy questions to cheaper models (Claude Haiku) and hard ones to capable models — [source: building-effective-agents.md]
Sub-agents return only final output, not reasoning — about "compression, not just speed" — [source: suryanshti-sub-agents-vs-agent-teams-2026.md]
Context-based decomposition beats role-based decomposition: keep tasks together when they share deep context — [source: suryanshti-sub-agents-vs-agent-teams-2026.md]
Sub-agent description field acts as a routing signal for parent orchestrators — [source: suryanshti-sub-agents-vs-agent-teams-2026.md]

Open Questions¶

When does an evaluator-optimizer loop converge vs. oscillate? What stopping criteria work in practice? (raised by: concepts/agentic-workflows, 2026-04-09)
Is there a principled way to choose among the five patterns for a new problem, or is it always empirical? (raised by: concepts/agentic-workflows, 2026-04-09)
How do agentic workflow patterns map to formal computer science constructs (DAGs, state machines, recursive descent)? (raised by: concepts/agentic-workflows, 2026-04-09)
Is there a principled way to choose between sub-agents vs. agent teams for a given problem, beyond the independence/dependence heuristic? (raised by: suryanshti-sub-agents-vs-agent-teams-2026.md, 2026-04-25)
How does the Claude Agent SDK's description-based routing compare to explicit orchestrator logic — which is more reliable in practice? (raised by: suryanshti-sub-agents-vs-agent-teams-2026.md, 2026-04-25)

Sources¶

Building Effective AI Agents — Anthropic engineering guide; five workflow patterns, ACI principles, agent use cases (Dec 2024)
Sub-Agents vs Agent Teams — Suryansh Tiwari X article; sub-agent vs. agent team distinction, context-based decomposition, Claude Agent SDK example (Apr 2026)