There is a reflex in agentic system design to decompose everything. Break the task into subtasks, break subtasks into steps, break steps into micro-operations. The reasoning seems sound: smaller pieces are easier to manage, easier to retry, easier to parallelize. But in practice, you hit a floor. Decompose past it and your system doesn’t get more efficient. It gets more fragile, more verbose, and paradoxically slower. Understanding where that floor sits is one of the most underappreciated design decisions in agent orchestration.
The Decomposition Problem
When you’re building a multi-agent workflow, every task decomposition involves a trade-off. Splitting work into smaller pieces enables parallelism and fault isolation. But every handoff between agents carries a cost. You have to pass state across that handoff, you have to prompt the sub-agent with enough context to do its job, and you have to verify that the output from one stage is compatible with the input the next stage expects. Each of these handoffs creates a boundary in your system, and those boundaries add up.
These costs are not fixed. They scale with the number of boundaries you create. Below a certain granularity, the coordination overhead outweighs the benefit of decomposing further.
This is the granularity floor. It is a cost-benefit tipping point: the place where splitting work smaller starts costing you more in coordination than you gain in simplicity.
Why the Floor Exists
The floor has three root causes.
Context coupling. No matter how small you make a sub-task, the agent doing it still needs to understand the project around it. Consider a task like “rename this variable across the codebase.” You could decompose it into: find all occurrences, check each occurrence for semantic correctness, make the replacement. But the agent doing the semantic check needs to understand the surrounding code. You haven’t eliminated the context requirement; you’ve duplicated it. Every sub-agent gets its own copy of the same background, the same project structure, the same constraints. The tokens you saved by shrinking the task get spent again passing context into each new agent. At some point you’re paying more in duplicated context than you saved by decomposing.
Prompt overhead. Every agent invocation requires a prompt. That prompt has to establish the task, the constraints, the output format, and enough background for the agent to act. For very small sub-tasks, the prompt scaffold can be longer than the actual work. You end up with agents that spend most of their context budget just understanding what they’re being asked to do.
State management between sub-tasks. When agents hand off work, something has to manage the state between them, typically an orchestrator that collects outputs, validates them, and formats them for the next stage. This orchestration logic is real code with real failure modes. Every additional handoff is another surface for mismatched assumptions, schema drift, and partial failure handling. The thinner the sub-task, the more handoffs you need, and the more brittle the seams between them become.
An Old Lesson from Software Engineering
This problem has a direct analog in function design. Early enthusiasm for decomposition in software produces the same failure mode: functions so small they carry no meaningful logic of their own. A one-line function that does nothing except call another function has a name, a signature, and a docstring. It shows up in stack traces. It has to be mocked in tests. It costs more to maintain than the abstraction saves.
The advice to write small functions is sound advice, up to a point. A function should encapsulate a complete, meaningful unit of logic. “Complete” means it can be understood and tested without needing to read its callers. “Meaningful” means it makes a decision or performs a transformation that has value in isolation. A function that just dispatches to another function does neither.
Agent tasks work the same way. A task should represent a complete, verifiable unit of work. If you can’t look at the task specification and its output independently and determine whether the task was done correctly, it’s probably not a real unit.
What the Floor Looks Like in Practice
I build pipelines in bioinformatics. Take a GATK-based variant calling pipeline. You could decompose HaplotypeCaller into micro-tasks: one agent opens the BAM file and reads the header, another iterates through reads and computes coverage, another decides which regions to call, and so on. But you’ve just reimplemented HaplotypeCaller in agent form, badly. The tool already handles those internal steps. The agent’s job is to orchestrate the tool, not to decompose the tool’s internals.
The smallest useful unit of work here is something like: “Run HaplotypeCaller on this BAM file for this sample, using these parameters, and produce a GVCF.” That task is self-contained. It has clear inputs (BAM, reference genome, parameter set), a clear output (GVCF), and an unambiguous success condition (output file exists, passes validation, contains variant calls).
The correct unit of work is the tool call plus its validation, not the internals of the tool.
Contrast that with a higher-level task: “Process all 200 samples in this cohort through the variant calling pipeline.” That’s too coarse to hand to a single agent. The work can be parallelized, the samples are independent, and a single agent would blow past its context budget trying to track 200 in-flight jobs. The right decomposition is one agent per sample, orchestrated by a coordinator.
The floor in this case is the sample, not the read. The ceiling is the cohort. Everything in between is a legitimate point of decomposition; everything below the sample level becomes noise.
The Agent’s Capability Determines the Floor
This is a bigger deal than it might seem at first. The floor is not a fixed property of a task. It moves depending on the agent you’re using.
If you’re running a premier model with a large context window, the floor drops significantly. The model can hold a complex task in mind, track dependencies across files, and coordinate multiple concerns in a single pass. At that tier, you’re mostly optimizing around the context window itself, not the model’s ability to reason about the work. The constraint is how much fits, not whether the agent can handle it.
But if you’re trying to save on inference cost by using a smaller, less capable model, the calculus changes. Those models tend to have smaller context windows, yes, but the real bottleneck shifts to capability. A weaker model can’t coordinate as many moving parts. It can’t hold as strong a mental model of your codebase. You have to reduce the task not because it won’t fit in context, but because the agent will lose the thread if you ask it to juggle too many concerns at once. The floor rises, and you need more decomposition to compensate, which means more handoffs, more duplicated context, more coordination overhead.
This creates a counterintuitive dynamic where a cheaper model per token can end up costing more overall. You decompose further to accommodate its limits, which means more agents, more prompts, more duplicated context, and more orchestration logic to maintain. Sometimes the more expensive model is the cheaper option. Anthropic’s recent introduction of advisor mode in Claude Code speaks to exactly this: letting a stronger model handle the planning and coordination while lighter models execute the individual steps. That’s a decomposition strategy informed by the capability floor.
There’s probably a full article in model-tier optimization for agentic workflows. For now, the point is this: as models improve, the floor drops. Tasks that required multi-agent decomposition two years ago can be handled in a single pass today. Build your decomposition logic so it can be collapsed as capabilities improve, rather than baking in artificial granularity that becomes overhead when you upgrade.
The Sweet Spot
A well-decomposed agent task has four properties.
-
It is self-contained. The agent can execute it without needing to reach back to the orchestrator for clarifying information mid-task.
-
It has clear inputs and outputs. Both can be specified in advance and validated after the fact without running the task again.
-
It can be independently verified. There is a ground truth against which the output can be checked (a schema, a test, a known-good reference) without needing to understand the full pipeline context.
-
It fits comfortably in context. The task specification, the necessary background, and the expected output can all fit in the agent’s context window with room to spare. If you’re fighting the context window to fit the task, the task boundary is probably wrong.
That last point is not incidental. The context window is the physical constraint that makes the granularity floor real. An agent that cannot hold its entire task in mind cannot complete the task coherently. Decomposition is, at its core, the art of shaping work so it fits within that constraint without losing the context coupling that makes the work meaningful.
Get the granularity right and your agents run fast, fail cleanly, and compose predictably. Get it wrong (too coarse or too fine) and you spend your engineering time on coordination overhead instead of capability. The floor is worth finding.