Vol. XV / Issue 08

The McKinnie Dispatch

Filed from the reducer queue

Agent systems Software work 2026

The old laws still apply

Agent swarms still obey Amdahl's Law.

More agents can buy search breadth, context isolation, and independent checks. They do not remove the serial path where someone has to merge the work, judge the result, and own the final answer.

Agent swarms do not repeal the old laws of software engineering. They give us a faster way to violate them. That does not make parallel agents useless. It means we should talk about them like a concurrency tool, not like magic headcount.

What scales Breadth

More agents can search, compare, test, and summarize more branches at once.

What leaks State

Prompts, repo context, permissions, assumptions, and edits cross boundaries.

What caps speed Reducer

One path still has to merge the work, verify behavior, and own the answer.

A newspaper-style technical illustration of many small AI agent machines feeding parallel work streams into one narrow review gate.
Plate I: The reducer queue The wide side is easy to demo: many agents can explore, inspect, and prepare candidate work at once. The narrow side is where the system still has to choose, merge, verify, and own one result.
01 Fan out

Split only the work that can run independently.

02 Isolate

Keep noisy logs, searches, and candidate edits out of the main thread.

03 Reduce

Collapse findings into one coherent decision, not a pile of summaries.

04 Verify

Trust checks, tests, and evidence before trusting agent confidence.

The current pitch is easy to understand: one agent is useful, so several agents should be much more useful. Let one inspect the repo, one write tests, one patch the backend, one patch the frontend, one review the result, and maybe the whole thing finishes in a fraction of the time.

Sometimes that works. A lot of the time, especially in software, the hard part is not producing more work in parallel. The hard part is deciding which work should exist, keeping shared state coherent, validating the behavior, and turning several partial answers into one responsible answer.

The forgotten laws were never about CPUs. They were about coordination. LLM agents changed the workers, not the math.

The useful vendor docs are boring in the right way.

The vendor pages I trust do not say "spawn as many workers as possible." They talk about isolation, supervision, and the places where parallel work stops being free.

OpenAI's Codex subagents documentation frames subagents as a way to keep noisy work out of the main thread. Use them for exploration, tests, triage, and summarization. Be more careful when several agents are writing code at once, because conflicts and coordination overhead become part of the work. That is not anti-agent guidance. It is systems guidance.

OpenAI's Codex app announcement points in the same direction from the product side: multiple agents, separate threads, isolated worktrees, long-running tasks, and a UI built around supervision. The hard product problem is making parallel work reviewable after the agents come back.

Anthropic's multi-agent research-system writeup is even more explicit. Their Research system gets value from a lead agent delegating to parallel subagents with separate context windows. That worked well for broad research, where different workers can explore different branches and compress findings back to the lead. The same post also says multi-agent systems spend many more tokens, are only economically sensible when the task value justifies that extra work, and that most coding tasks have fewer truly parallel pieces than research.

Anthropic's Claude Code subagents docs make the same tradeoff visible in product language: subagents get their own context and focused instructions, but they can add latency because they start from a clean slate and have to gather enough context to be useful.

That is the pattern. The people building these systems are not saying coordination is free. They are building places to contain the cost.

More agents can be the right answer.

The pro-swarm argument is real. Multiple model samples, agents, or reasoning paths can improve results when the task has a clean way to judge the answer. The paper More Agents Is All You Need found that a simple sampling-and-voting setup can scale performance with the number of agents instantiated, with the benefit tied to task difficulty.

That makes intuitive sense. If the problem is search-heavy and the answer can be judged cleanly, more attempts can help. If ten agents propose ten candidate fixes and a reliable test suite picks the winner, that is useful. If several agents inspect different logs, docs, services, or versions and return concise findings, that is useful too.

The load-bearing phrase is "can be judged cleanly." Without that, a swarm does not give you ten answers. It gives you ten new things to reason about.

Exhibit A: The fit check Parallel agents are strongest when the split is natural and the reducer has a clear way to judge the result. They are weakest when every worker needs the same changing shared state.
const swarmFit = {
  strong: [
    "repo exploration",
    "log triage",
    "independent test runs",
    "source comparison",
    "first-pass research"
  ],
  risky: [
    "shared API redesign",
    "cross-cutting refactors",
    "ambiguous product calls",
    "stateful migrations"
  ],
  reducerCost: [
    "choose the path",
    "merge the work",
    "verify behavior",
    "own the final answer"
  ]
};

The old laws are still sitting there.

01

Amdahl's Law

Speedup is capped by the part that cannot be parallelized. In agent work, that serial path is synthesis, review, merge, and final validation.

02

Gustafson's Law

Agents feel better when the problem gets broader. The gain comes from increasing useful search space, not from pretending coupled work became independent.

03

Brooks's Law

Adding workers adds communication and review overhead. Agents still need task boundaries, context, permissions, conventions, tests, and handoff format.

04

Little's Law

Too much parallel work becomes a reducer backlog. The bottleneck moves from producing candidates to responsibly accepting or rejecting them.

05

Goodhart's Law

Optimizing for agent count, PR count, benchmark pass rate, or autonomy theater can make the metric better while the product judgment gets worse.

Software has shared state.

Research work often decomposes cleanly. One agent can search vendor docs. Another can search papers. Another can inspect pricing. A lead agent can compress that into a report. The subagents do not usually mutate the same source of truth while they work.

Software work is more often coupled. A backend change affects the frontend. A schema change affects tests, migrations, docs, and customer behavior. A product decision affects copy, permissions, analytics, support expectations, and rollout order. Even when the code edits are in different files, the design state is shared.

That is why write-heavy swarms are harder. The agents can all be locally right and globally wrong. One changes the API shape. Another writes UI against the old shape. Another updates tests around a third interpretation. The final problem is not intelligence. It is consistency.

This is also where benchmarks need care. MultiAgentBench exists because single-agent benchmarks do not capture all of the coordination and competition dynamics. Multi-SWE-bench widens software-agent evaluation beyond mostly Python issue resolution. And the failure taxonomy paper Why Do Multi-Agent LLM Systems Fail? is blunt about the failure classes: system design issues, inter-agent misalignment, and task verification. Those are not side details. Those are the job.

Field note

The Archibot translation sweep was the good version.

The recent Archibot example was Console globalization: many locale catalogs, lots of repeated review work, and one product voice that still had to hold together.

Batch 3 + 5 + 4

Early RTL catalogs, western European catalogs, then CJK and Korean.

Split 1 per locale

Worker agents reviewed isolated catalog files instead of fighting over one change.

Reducer Checks

i18n validation, build, persona smoke tests, and a final human cleanup pass.

Why that parallel job worked

Locale catalogs are naturally parallel. French, German, Portuguese, Italian, Dutch, Japanese, Korean, Simplified Chinese, and Traditional Chinese can be reviewed in separate worktrees and separate agent contexts without each worker needing to redesign the product. I kept the manual sweep, western-locale batch, and CJK/Korean batch isolated so branch state stayed clean while the agents worked one catalog at a time. The ownership boundary is obvious: this agent owns this catalog, this batch, and this review pass.

The part that still needed a reducer was everything the agents were tempted to over-touch. Automated translation tried to localize technical literals: CSS utility classes, reserved example domains, provider hostnames, credential labels, and acronym-heavy UI copy. The fix was not "more agents." The fix was protected literals, stronger i18n checks, a build, smoke tests, and one final pass that decided what was product language and what was code-adjacent material.

Rule

The more parallel the candidate work became, the more the shared contract mattered: stable English source, protected technical strings, one catalog per worker, and deterministic checks before merge.

How I would use them.

I like parallel agents as sidecars, not as an unmanaged crowd. One lead thread owns the problem. Sidecars get bounded work that can run without blocking the next decision: inspect this module, compare these two approaches, run this test slice, summarize these docs, audit this narrow risk, prepare one candidate patch.

The sidecar should return evidence, not vibes. File paths. Commands. Test output. Exact assumptions. A clear "I did not check this" section. If it changed files, the write scope should be small and disjoint. If it only explored, the output should be concise enough that the lead can use it without inheriting all the noise.

Then the reducer matters. The reducer is the boring part people skip in demos: choose the winning path, merge the state, run the checks, notice contradictions, and decide whether the final answer is good enough to ship. A swarm without a reducer is just parallel note taking with side effects.

Exhibit B: The contract The lead agent is not there for ceremony. It is there because every parallel system needs a consistency boundary somewhere.
agent_swarm_contract:
  lead:
    owns: decision, integration, final verification
  sidecars:
    own: bounded subtasks with disjoint context
    output: evidence, patch, test result, caveat
  shared_state:
    default: read-only
    writes: isolated until reviewed
  done:
    requires: deterministic checks plus one responsible reducer

The practical rule.

Use more agents when the task has independent branches, expensive search, noisy logs, broad comparison work, or a clean oracle. Keep one strong agent when the task requires coherent product judgment, shared architectural state, or a long chain of dependent decisions.

For software teams, the useful future is probably not a thousand agents chatting with each other. It is more likely to look like good systems engineering: isolated worktrees, explicit task ownership, scoped permissions, artifact contracts, deterministic checks, queues, traces, review surfaces, and one place where responsibility lands.

That is less glamorous than the swarm pitch. It is also why it might work.

The danger is not that AI agents are too dumb to work in parallel. The danger is that they are just smart enough to recreate the coordination problems we were trying to automate away.