Levels 6–8 of Agentic Engineering
This phase is reference material for advanced orchestration patterns. You may not execute all of it immediately after Phase 9. Return to it as projects grow. The progression from Level 5 to Level 6 typically takes months of daily practice, not days.
This phase assumes you can: evaluate AI-generated code and identify common failure patterns (Ch 38), write effective CLAUDE.md files and manage context (Ch 40), and use Claude Code for single-task development with tests (Ch 39-42). Phase 10 is reference material — return to it as your projects grow.
A single Claude Code session is powerful. But some work doesn't fit in one session:
- Context limits: One session can't hold an entire large codebase. Specialized agents focus on specific areas.
- Separation of concerns: The agent that writes code shouldn't grade its own work. Splitting implementation and review catches more bugs.
- Parallelism: Three independent features built simultaneously finish in the time of one.
- Specialization: An agent configured for security auditing produces better results than a general-purpose agent asked to check security as an afterthought.
Think of it like a team at a company: one person writes code, another reviews it, another writes tests, another updates documentation. More people means more throughput—but also more coordination overhead. More agents without constraints produces chaos, not productivity. That's why this phase focuses on harnesses and guardrails as much as on dispatching work.
Chapter 43 Level 6 — Harness Engineering & Automated Feedback Loops
At Levels 1-5, you're the quality gate—you review every change. That works for single features, but it doesn't scale. Level 6 builds automated quality gates so agents can catch their own mistakes, freeing you to focus on design decisions instead of line-by-line review.
At Level 5, you gave the agent capabilities. At Level 6, you give the agent guardrails that let it verify its own work. The shift: your job moves from reviewing code to designing the harness.
Backpressure
Backpressure is automated feedback that lets agents self-correct without human intervention. Instead of you catching errors in code review, the tools catch them immediately:
| Mechanism | What It Catches |
|---|---|
| TypeScript strict / mypy | Type errors |
| Linter (ESLint, Ruff) | Style violations |
| Test suite | Behavioral regressions |
| Pre-commit hooks | Format, lint, type-check before every commit |
| CI pipeline | Integration failures |
When the agent runs git commit and a pre-commit hook fails, the agent sees the error, fixes the code, and tries again. That's backpressure in action.
You ask an agent to add a feature. It writes code with a subtle type error. Without a type checker, the code looks fine. You deploy it. It crashes in production. With pre-commit hooks running the type checker, the agent catches and fixes the error before it ever reaches you. The fix costs seconds instead of hours.
Constraints > Instructions
Step-by-step instructions tell the model how to work. Constraints tell it what success looks like and let it figure out the how. Constraints scale better because the model adapts its approach to the problem.
2. Add an email field.
3. Add RFC 5322 validation.
4. Update the migration.
5. Update the tests.
Requirements: RFC 5322 validation, unique constraint.
Acceptance: All tests pass, new tests for valid/invalid/duplicate/null.
Run: Work until
npm test passes.Security Boundaries
Agents, code, and secrets live in separate trust domains. Never give an agent direct access to production credentials. Use environment variables, secret managers, and least-privilege access.
Docs-as-Navigation
CLAUDE.md as a ~100-line table of contents. Detailed docs elsewhere (architecture.md, API docs, deployment guide). The model discovers on demand via the progressive disclosure pattern from Chapter 22.
TaskForge Connection
You'll set up a pre-commit hook for TaskForge that runs Ruff (linter) and pytest. This lets Claude Code self-correct when adding features.
Case Study: The Docker Scaffold
Here's a real-world instruction scaffold that applies every principle from this chapter and from Chapter 21's scaffold section. The goal: tell Claude Code how to containerize any project for multi-agent development. This scaffold was refined over dozens of iterations where Claude got things wrong—each failure became a new constraint.
What Makes It Work
The scaffold is a CLAUDE.md file, roughly 300 lines. Here's why each section exists:
One-sentence goal
"One container. When you exec into it, you get Claude Code pointed at your repo. CMD is sleep infinity." Leaves zero room for interpretation. The model knows what "done" looks like.
Exact file contents, not descriptions
The Dockerfile, docker-compose service, and shell script are provided verbatim—not described. FROM debian:bookworm-slim, not "use a Debian-based image." curl -fsSL https://claude.ai/install.sh | bash, not "install Claude Code." Every line the model generates from scratch is a line that could vary between runs.
Explicit prohibitions from real failures
Each "NEVER" line exists because Claude did the wrong thing in practice:
- "NEVER use npm install—it causes auth/PATH corruption" — Claude's training data includes old npm-based tutorials
- "Do NOT add CLAUDE_MODEL to the environment block—it is not a real env var" — Claude hallucinated this env var repeatedly
- "NEVER overwrite .env—append only" — Claude destroyed existing configuration by rewriting the file
- "Do NOT add any environment: block to the claude service" — Claude added redundant env vars that conflicted with env_file
Failure mode table
An authentication reference table maps symptoms to causes to fixes. "Onboarding wizard appears" → ".claude.json missing" → "Dockerfile pre-creates it." This gives the model a debugging playbook, not just a build recipe.
Validation checklist
15 yes/no checks that verify correctness: "Is the base image debian:bookworm-slim?" "Was .env NOT overwritten?" "Does agent.sh use --dangerously-skip-permissions?" These are deterministic guardrails. The model can't rationalize its way past a checklist.
This scaffold works because it minimizes the decisions the model needs to make. A vague "set up Docker" gives Claude thousands of possible configurations. The scaffold narrows that to essentially one. This is the fundamental pattern: the tighter the constraints, the more deterministic the output. Every unspecified detail is a coin flip. Good scaffolds eliminate coin flips.
Notice the scaffold doesn't just instruct—it protects. Append-only rules for .env prevent data loss. Separate Dockerfile.claude prevents clobbering the project's existing Docker setup. Volume mounts for auth prevent re-authentication on every restart. The scaffold is defensive engineering: it assumes the model will try to do something destructive and makes it structurally impossible.
When you build your own scaffolds—for deployment, testing, code generation, or any repeatable task—follow this pattern: goal → exact content → prohibitions → failure modes → validation. The more specific you are, the more reliable the output becomes, across every run, regardless of the model's non-determinism.
Micro-Exercises
Create .pre-commit-config.yaml for TaskForge with a Ruff linter step. Run pre-commit install. Make a deliberate style violation and commit—watch it get caught.
Write a constraint-based prompt (not step-by-step) for adding a feature to TaskForge. Include: what to build, acceptance criteria, and the command to verify.
Set up pre-commit (linter + pytest) for TaskForge. Then ask Claude Code:
Add a `priority` field to tasks (high/medium/low) with validation.
Work on it until pre-commit passes cleanly.
Fix failures yourself. Don't ask me unless genuinely stuck.Watch the agent self-correct through backpressure.
Verification: The pre-commit hook passes. The feature works. You didn't intervene.
If this doesn't work: (1) Pre-commit not running → pre-commit install must be run inside the git repo. (2) Agent enters infinite loop → constraints might be contradictory. Simplify. (3) Agent asks for help immediately → your CLAUDE.md might lack necessary context.
Interactive Exercises
Code Validator
Write validate_code(code_str) that checks Python code for common issues. Return a list of issue strings. Check for: bare except: (should specify exception type), from X import *, functions longer than 30 lines, and TODO comments.
Iterate through lines. Check each line for patterns.
For bare except: look for lines matching except: (with colon, no exception type).
For function length: track when a def starts and count indented lines until the next def or unindented line.
Knowledge Check
What is 'backpressure' in agentic engineering?
Harness Setup
Chapter 44 Level 7 — Background Agents
Level 7 is where AI stops being a tool you use and starts being a team you manage. Instead of working with one agent at a time, you dispatch specialists—like a project manager assigning tasks to team members with different expertise. Each agent works in its own space, and results come back to you for review.
At Level 6, the agent self-corrects. At Level 7, you dispatch multiple agents on independent tasks while your session stays lean. This is where the leverage multiplies.
Background Agent Orchestration
Your main session becomes a command center. Workers execute in isolated contexts (fresh context windows, separate worktrees). Stuck workers surface questions back to you.
Remember: each "background agent" is just a Claude Code session with a fresh context, a specific task, and access to your project's files and tools. There's no magic—it uses the same model, the same CLAUDE.md, the same test suite. The leverage comes from parallelism and specialization, not from any new capability.
Dispatch. Install: npx skills add bassimeledath/dispatch -g. Workers get fresh context windows. Stuck workers surface questions.
/dispatch pre-launch sweep for TaskForge:
1) security audit the auth flow — use opus, worktree
2) write missing integration tests — use sonnet
3) update documentation — use haiku
Multi-Model Dispatch
Different models for different tasks. Each has strengths:
| Model | Best For |
|---|---|
| Opus | Architecture, security audits, complex reasoning |
| Sonnet | Implementation, feature building, test writing |
| Haiku | Formatting, documentation, simple transforms |
| Gemini | Research, large-context analysis |
| Codex | Code review, parallel generation |
Choosing the Right Model for the Task
Not every task needs the most powerful model. Matching model capability to task complexity saves cost and often improves speed without sacrificing quality.
| Task Type | Recommended Tier | Why | Example |
|---|---|---|---|
| Architecture decisions, complex refactors, ambiguous specs | Highest capability (e.g., Opus) | Requires deep reasoning, handling ambiguity, and maintaining coherence across many files | "Redesign TaskForge's storage layer from JSON files to SQLite, maintaining all existing tests" |
| Feature implementation, bug fixes, test writing | Mid-tier (e.g., Sonnet) | Well-defined tasks with clear acceptance criteria; strong capability at lower cost | "Add a --priority flag to the add command with values low/medium/high" |
| Documentation, formatting, boilerplate, simple edits | Fast/lightweight (e.g., Haiku) | Mechanical tasks where speed matters more than deep reasoning | "Add docstrings to all public functions in taskforge/api.py" |
Model names change. "Opus," "Sonnet," and "Haiku" are current as of March 2026. The principle is stable: match model capability to task complexity. Use the most capable model for ambiguous or high-stakes work; use faster models for well-defined, mechanical tasks. Check docs.anthropic.com/models for the current lineup.
When in doubt, use the mid-tier model. It handles 80% of development tasks well. Escalate to the highest tier only when the task is ambiguous, touches many files, or requires architectural judgment. Drop to the lightweight tier for bulk operations where you'd review the output regardless.
Implementer/Reviewer Separation
Never let the same model grade its own exam. If Sonnet writes the code, use Opus or a different model to review it. Biased self-evaluation is a known failure mode.
The Ralph Loop
An autonomous agent loop: run until a PRD (Product Requirements Document) is complete, each iteration gets a fresh context. Caution: under-specified PRDs bite back. The loop amplifies ambiguity—a vague requirement becomes a confidently wrong implementation repeated across 10 iterations.
The Ralph Loop is an autonomous looping agent pattern: the agent runs a task, checks the results against success criteria, and if the criteria aren't met, it iterates—fixing issues and re-checking—until the task succeeds or a maximum iteration limit is hit. Each iteration gets a fresh context window, which prevents context pollution from failed attempts. The key insight is that the loop is self-correcting: the agent doesn't just retry blindly, it reads the failure output and adapts its approach. This makes it powerful for tasks with clear, testable success criteria.
Set up a simple Ralph Loop for TaskForge:
claude --background "Fix all failing tests in TaskForge.
Run: python3 -m pytest
If any tests fail, read the failure output, fix the code, and re-run.
Repeat until all tests pass.
Maximum 5 iterations. If still failing after 5, stop and report what's left."To test this, intentionally break something first: introduce a bug in models.py (e.g., change a return value, break a validation check). Then launch the loop and watch the agent find and fix it.
Verification: The agent's final output shows all tests passing. Check git diff to confirm the fix is sensible—not a hack like deleting the failing test.
If this doesn't work: (1) Agent loops forever → the iteration limit is essential, never omit it. (2) Agent deletes tests instead of fixing code → add to the prompt: "Do NOT delete or skip tests. Fix the source code." (3) --background not available → use dispatch instead: /dispatch fix all failing tests, max 5 iterations — use sonnet.
A Ralph Loop without a maximum iteration count can run indefinitely, burning API credits and potentially making the codebase worse with each pass. Always specify a hard limit (3-5 iterations for most tasks). If the agent can't fix it in 5 tries, the problem needs human judgment, not more iterations.
CI-Triggered Agents
Agents that activate on repository events: PR review bots on pull request, docs updaters on merge, security scanners on dependency changes.
TaskForge Connection
You'll dispatch multiple independent improvements to TaskForge using different models. This is the first time TaskForge benefits from parallel AI development.
Micro-Exercises
Run npx skills add bassimeledath/dispatch -g. Then: /dispatch use sonnet to list all functions in the project without docstrings.
Dispatch two tasks simultaneously to different models. Compare output quality. Which model was better for which task?
Dispatch three independent TaskForge improvements:
/dispatch three TaskForge improvements:
1) add due dates with reminder logic — use sonnet
2) add tags with filtering — use sonnet
3) improve error handling across all functions — use haikuReview all three outputs. Are there conflicts? Fix them.
Verification: All three features work. Tests pass. No conflicts in the merged result.
If this doesn't work: (1) Dispatch not found → ensure install completed, restart Claude Code. (2) Workers fail silently → check .dispatch/ for logs. (3) Merge conflicts → expected with parallel work. Resolve manually.
Interactive Exercises
Knowledge Check
What is a Ralph Loop?
Knowledge Check
Why should you set a maximum iteration count for autonomous agent loops?
Background Agent Usage
Chapter 45 Claude Code in Docker
Running Claude Code in Docker gives you isolated, reproducible AI coding environments. This is how you scale from one agent to many without polluting your local machine.
Why Run Claude Code in Docker?
Running Claude Code directly on your host machine works fine for single sessions. But as you scale to multiple agents, Docker solves problems that become unavoidable:
- Isolated environments per project/agent: Each agent gets its own filesystem, tools, and dependencies—no cross-contamination.
- Reproducible setups: Same tools, same config, every time. No drift between machines or sessions.
- Safe experimentation with
--dangerously-skip-permissions: Container isolation makes this flag safe—the agent can't touch anything outside the container. - Multiple parallel agents: Run several agents simultaneously on the same codebase using volume mounts.
- CI/CD integration: Trigger Claude Code agents from pipelines—automated code review, test generation, documentation updates.
Prerequisites
Before proceeding, ensure you have:
- Docker installed and running (Chapter 16)
- Claude Max or Pro subscription (or an Anthropic API key)
The Dockerfile.claude Pattern
Here's the production-grade Dockerfile for running Claude Code in a container. This is the result of the scaffold pattern from Chapter 43—every line exists because leaving it out caused a failure.
FROM debian:bookworm-slim
# System packages — minimal set for Claude Code to operate
RUN apt-get update && apt-get install -y --no-install-recommends \
git bash curl ca-certificates build-essential \
jq tree ripgrep python3 python3-pip \
&& rm -rf /var/lib/apt/lists/*
# Non-root user
RUN groupadd -r claude && useradd -r -g claude -m -s /bin/bash claude
# Install Claude Code (NATIVE installer — never npm)
USER claude
WORKDIR /home/claude
RUN curl -fsSL https://claude.ai/install.sh | bash
ENV PATH="/home/claude/.local/bin:/home/claude/.claude/bin:${PATH}"
# Pre-configure auth so the onboarding wizard never appears
RUN mkdir -p /home/claude/.claude && \
echo '{"hasCompletedOnboarding":true,"installMethod":"native"}' \
> /home/claude/.claude/.claude.json && \
ln -sf /home/claude/.claude/.claude.json /home/claude/.claude.json && \
echo '{"permissions":{"allow":["*"],"deny":[]},"env":{"CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC":"1"}}' \
> /home/claude/.claude/settings.json
WORKDIR /workspace
CMD ["sleep", "infinity"]
Why each decision matters:
| Decision | Why |
|---|---|
debian:bookworm-slim | Must be glibc-based. Alpine/musl breaks the native Claude Code binary. |
| Native installer, not npm | The npm package uses different config paths and causes auth corruption. The native installer is Anthropic's official method. |
Non-root claude user | Security best practice. Limits blast radius if something goes wrong. |
Pre-created .claude.json | Without it, the onboarding wizard appears every time—blocking headless operation. The symlink covers both config paths different versions check. |
settings.json with "allow":["*"] | Equivalent to --dangerously-skip-permissions baked into config. No prompts in headless mode. |
CMD ["sleep", "infinity"] | Container stays alive. You docker exec into it for each session. Multiple execs = multiple Claude Code sessions, all sharing the workspace. |
This Dockerfile is the scaffold pattern in action. A vague "create a Dockerfile for Claude Code" instruction produces wildly different results: npm vs native, Alpine vs Debian, root vs non-root, ENTRYPOINT vs CMD. Each wrong choice causes a specific failure—auth loops, binary crashes, permission errors. The scaffold eliminates every coin flip. That's how you make non-deterministic output reliable.
Authentication Methods
Claude Code needs authentication. Two approaches, and they must never coexist in the same .env:
OAuth Token (Claude Max/Pro)
Generate a token on your host machine, then pass it as an environment variable:
Add the resulting token to your .env file:
CLAUDE_CODE_OAUTH_TOKEN=sk-ant-oat01-your-token-here
The token is passed into the container via env_file: .env in docker-compose.yml. It is never baked into the Docker image.
API Key
For per-token billing (without a subscription):
ANTHROPIC_API_KEY=sk-ant-api03-your-key-here
If both ANTHROPIC_API_KEY and CLAUDE_CODE_OAUTH_TOKEN exist, the API key takes precedence and you get billed per-token instead of using your subscription. Pick one.
Running a Single Claude Agent
The pattern: start the container once, then docker exec into it for each session. Each exec is an independent Claude Code session sharing the same workspace.
The -v $(pwd):/workspace volume mount maps your current directory into the container. Changes the agent makes inside the container appear on your host filesystem immediately.
Running with --dangerously-skip-permissions
The --dangerously-skip-permissions flag tells Claude Code to skip all permission prompts—file writes, command execution, network requests. The agent acts without asking.
On your host machine, this flag is genuinely dangerous—the agent could modify any file, run any command. Inside a Docker container, the risk is contained: the agent can only affect the container's filesystem and the mounted volume. This is why containers and --dangerously-skip-permissions are a natural pair: container isolation makes autonomous operation safe.
Run a non-interactive task with full autonomy:
The -p flag passes a prompt directly instead of opening an interactive session. Combined with --dangerously-skip-permissions, this lets the agent work completely autonomously: read files, edit code, run tests, and iterate until done.
Docker Compose Setup
The compose file defines the claude service alongside any existing project services:
# docker-compose.yml
services:
claude:
build:
context: .
dockerfile: Dockerfile.claude
container_name: claude-${COMPOSE_PROJECT_NAME:-agents}
env_file: .env
volumes:
- .:/workspace
- claude-auth:/home/claude/.claude
working_dir: /workspace
stdin_open: true
tty: true
volumes:
claude-auth:
name: claude-auth-${COMPOSE_PROJECT_NAME:-default}
Key details: claude-auth volume persists authentication across container restarts. .:/workspace mounts your project. env_file: .env passes the OAuth token. The container name includes the project directory name so each repo gets a unique container.
When multiple agents share the same volume mount, they can write to the same files simultaneously. This can cause conflicts—one agent overwrites another's changes. Strategies: (1) assign agents to non-overlapping directories, (2) use git worktrees so each agent has its own working copy, (3) run agents sequentially in a pipeline instead of in parallel.
The agent.sh Pattern
A thin convenience script that ensures the container is running, then execs in. The model is forced via --model flag (environment variables don't work for model selection).
#!/usr/bin/env bash
set -euo pipefail
PROJECT_NAME="${COMPOSE_PROJECT_NAME:-$(basename "$(pwd)")}"
CONTAINER="claude-${PROJECT_NAME}"
# Start container if not running
if ! docker ps --format '{{.Names}}' | grep -q "^${CONTAINER}$"; then
docker compose up -d claude
sleep 2
fi
# Interactive or one-shot
if [ $# -gt 0 ]; then
docker exec -it "$CONTAINER" claude \
--dangerously-skip-permissions --model claude-opus-4-6 -p "$*"
else
docker exec -it "$CONTAINER" claude \
--dangerously-skip-permissions --model claude-opus-4-6
fi
Usage:
To run multiple agents, open more terminal panes and run ./agent.sh in each one. Each exec is an independent session, all sharing the same workspace.
Practical Patterns
Sequential Pipeline
Agent 1 writes code, Agent 2 reviews it, Agent 3 writes tests. Each agent's output feeds into the next. This ensures review before testing and catches issues early.
Parallel Workers
Multiple agents work on independent features simultaneously. When all finish, merge the results. Best when features don't share files.
Watcher Agent
An agent that monitors test results (or CI output) and dispatches fix agents when tests fail. This creates a self-healing pipeline: break something, and the watcher assigns an agent to fix it.
Resource Management
Docker lets you limit resources per container so one runaway agent doesn't consume your entire machine:
# docker-compose.yml
services:
agent-backend:
build:
context: .
dockerfile: Dockerfile.claude
command: ["--dangerously-skip-permissions", "-p", "implement API endpoints"]
volumes:
- .:/workspace
environment:
- ANTHROPIC_API_KEY
deploy:
resources:
limits:
memory: 2G
cpus: "1.0"
Clean up stopped containers regularly to reclaim disk space:
Common Failure Modes
| Symptom | Cause | Fix |
|---|---|---|
| Onboarding wizard appears | .claude.json missing or malformed | Dockerfile must pre-create it + symlink both paths |
| Token expires on restart | Auth files not persisted | claude-auth volume on /home/claude/.claude |
| API billing instead of subscription | Both ANTHROPIC_API_KEY and OAuth token in .env | Remove the API key; keep only the OAuth token |
| Binary crashes on startup | Alpine/musl base image | Use debian:bookworm-slim (glibc) |
| npm install breaks auth flow | npm version uses different config paths | Use native installer only |
TaskForge Connection
Run two Claude agents in Docker simultaneously—one to add a new feature to TaskForge, another to write tests for existing features. Then merge the results. This is the first time you'll see parallel AI development on your own project with full container isolation.
Micro-Exercises
Create Dockerfile.claude for TaskForge using the pattern above. Build with docker compose build claude. Verify the container starts: docker compose up -d claude && docker exec claude-taskforge-project echo "ok".
Run ./agent.sh "list all Python files in the project" and examine the output. Verify the agent found your project files inside /workspace.
Set up the full Docker scaffold for TaskForge: Dockerfile.claude, docker-compose.yml (with claude-auth volume), agent.sh, and .env with your OAuth token. Then open two terminal panes and run:
Both sessions share the same container and workspace.
Verification: Both agents complete their tasks. git diff shows modifications from both agents. The container stayed running throughout.
If this doesn't work: (1) Onboarding wizard appears → docker compose down -v && docker compose build --no-cache claude && docker compose up -d claude. (2) Token expired → run claude setup-token on host, update .env, restart container. (3) Agents conflict on the same file → expected; resolve manually or use git worktrees.
Interactive Exercises
Knowledge Check
Why is --dangerously-skip-permissions acceptable inside a Docker container?
Knowledge Check
What does -v $(pwd):/workspace do when running Claude Code in Docker?
Docker Agent
Chapter 46 Level 8 — Autonomous Agent Teams
Level 8 is the frontier—and it's important to understand both its power and its limits. Agent teams can tackle large projects where coordination between frontend, backend, and testing is essential. But more agents also means more coordination overhead, more potential for conflicting changes, and more need for robust CI. This chapter teaches you when teams are worth the complexity—and when simpler patterns are better.
Level 7 dispatches workers that report back to you. Level 8 is agents that communicate with each other—peer-to-peer coordination, not hub-and-spoke. This is experimental territory.
Peer-to-Peer Agent Coordination
Instead of all workers reporting to a single orchestrator, agents in a team share a mailbox and coordinate directly. A frontend agent can ask a backend agent about an API contract without routing through you.
Claude Code Agent Teams (experimental): export CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1. Team lead + workers with shared mailbox.
What Pioneers Found
Anthropic (16 agents building a C compiler): needed CI to prevent regressions. Without automated tests, agents would break each other's work.
Cursor (hundreds of agents for codebase migration): without hierarchy, agents churned—making and reverting the same changes repeatedly.
Current Orchestrators
| Tool | Pattern |
|---|---|
| Dispatch | Local, hub-and-spoke |
| Gas Town | Structured workflows |
| Multiclaude | Simple parallel execution |
| Claude Flow | Complex multi-step workflows |
| Ramp Inspect | Cloud VMs for isolation |
"For day-to-day work, Level 7 is where the leverage is." Level 8 is for very large projects where the coordination cost is justified. Most TaskForge-sized projects never need it.
More agents = more throughput—three features built simultaneously instead of sequentially. But more agents = more coordination overhead—merge conflicts, inconsistent patterns, duplicated work. And more agents without constraints = chaos—agents that undo each other's changes, introduce conflicting patterns, or confidently build the wrong thing faster. Verification and CI become more important as concurrency increases, not less.
Decision Tree: Choosing the Right Pattern
More agents is not always better. Match the coordination pattern to the task:
When NOT to Use Multiple Agents
The decision tree above starts with "Simple, one-file" routing to a single session. But that category is larger than it looks. Most tasks fit a single agent. Here's the concrete decision framework:
Use a single well-configured agent when:
- The task fits in one context window (roughly <20 files touched)
- The feature touches fewer than 5 files
- There are no independent subtasks that could run in parallel
- You can describe the entire change in one prompt
Escalate to multi-agent when:
- Independent features can genuinely be parallelized (not just "it would be nice")
- Implementation and review should be separated (builder-validator pattern)
- The task exceeds one context window and has natural split points
- You need different tool configurations for different subtasks
TaskForge at its current size? Single agent. A full-stack app with separate frontend, backend, and infrastructure changes? Consider dispatch. If you find yourself writing more orchestration code than feature code, step back to a simpler pattern.
TaskForge Connection
Would agent teams be appropriate for TaskForge? No—it's too small. Subagents or Dispatch are sufficient. Knowing when not to use a pattern is as important as knowing how.
Micro-Exercises
Write a decision tree (on paper or in markdown) for when you'd use each agent pattern. Include at least 5 branches.
For your TaskForge project: would agent teams be appropriate? Why or why not? (Expected: no—it's too small.)
"Perfection is achieved, not when there is nothing more to add, but when there is nothing left to take away."
—Antoine de Saint-Exupéry, Airman's Odyssey
Dispatch two agents to work on different TaskForge features simultaneously, then resolve the resulting merge conflict:
/dispatch two TaskForge features in parallel:
1) add an "archive_task" command that moves completed tasks
to an "archived" list — use sonnet, worktree
2) add a "task_stats" command that prints counts of
pending/completed tasks — use sonnet, worktreeBoth agents will modify shared files (main.py, models.py, tests). When both finish:
- Try to merge both branches into
main. Expect a merge conflict. - Open the conflicting file(s). Look for
<<<<<<<markers. - Resolve by keeping both features—don't discard either agent's work.
- Run
python3 -m pytestto verify both features work together.
Expected outcome: Both archive_task and task_stats work. Tests pass. Git log shows the merge commit.
Verification: git log --graph --oneline shows two branches merging. Both new commands work when tested manually. All tests pass.
If this doesn't work: (1) No merge conflict → agents may have touched different files. This is fine—it means the features were truly independent. (2) Tests fail after merge → both agents may have added conflicting test fixtures. Reconcile the test setup. (3) One agent's feature breaks the other → this is the coordination cost from the chapter introduction. Fix by reading both implementations and adjusting the integration points.
Interactive Exercises
Knowledge Check
When should you NOT use autonomous agent teams?
Dependency Planner
Write plan_execution(tasks) that takes a dict mapping task names to their dependencies (list of task names) and returns a list of sets, where each set contains tasks that can run in parallel. Tasks in later sets depend on tasks in earlier sets. Raise ValueError if there's a circular dependency.
Start by finding tasks with no dependencies (no unresolved deps). Those go in the first set.
After processing a set, remove those tasks from all dependency lists. Repeat until all tasks are planned.
If a round produces no new tasks but some remain, there's a cycle.
Multi-Agent Experience
Chapter 47 The Multiplayer Effect and What Comes Next
Individual skill has a ceiling. Teams have compounding leverage. This chapter connects your personal growth through the 8 Levels to the broader challenge of making your team effective—because the bottleneck is always the least-equipped member of the workflow.
Your individual level matters. But your team's level matters more. This chapter connects the 8 Levels to real-world team dynamics.
The Multiplayer Bottleneck
"If you're Level 7 raising PRs while you sleep, but your reviewer is Level 2, your throughput is capped at Level 2."
Team Skills Registry
At Block (and other companies), shared skills get PRs, reviews, and versions—same as code. A team skills registry means everyone benefits from every improvement.
Self-Assessment
The full self-assessment quiz with detailed checklists and next-step guidance for each level is in Appendix C: Self-Assessment Quiz. Take it now—identify your current level and the specific item blocking you from the next one.
What Comes Next
The field is moving fast. Expect: voice-to-voice coding, tighter CI/CD integration, cross-model coordination protocols, and the iterative nature of software itself being reimagined. But the fundamentals from Phases 1-4 don't change. Code is still logic expressed in text. Tests still verify behavior. Architecture still matters.
TaskForge Connection
Look at how far TaskForge has come: from a 40-line script to a tested, structured, AI-configured, multi-agent-ready project. That progression mirrors the 8 Levels. Your next project starts at whatever level you've reached.
Micro-Exercises
Use the full checklist in Appendix C. Write down your level honestly. Identify the specific item that blocks you from the next level.
If you work on a team (even a team of 2): estimate each member's level. Find the bottleneck. What's one thing you could share (a skill, a CLAUDE.md template, a workflow) to raise the team's floor?
Try This Now
Take the self-assessment above. Identify your current level. Write a 5-sentence action plan for reaching the next level within 30 days. Be specific: what tool to install, what skill to create, what habit to build.
Verification: Your action plan has concrete dates and deliverables, not just intentions.
"The computer programmer is a creator of universes for which he alone is the lawgiver. No playwright, no combiner of things ever fashioned universes complete with their own laws."
—Joseph Weizenbaum, Computer Power and Human Reason
Interactive Exercises
Knowledge Check
At what levels do most professional developers operate?
Self Assessment & Action Plan
Phase 10 Gate Checkpoint & TaskForge Multi-Agent
Minimum Competency
Pre-commit hooks providing backpressure. 2+ background agent tasks dispatched and reviewed. Output review with error identification. Agent pattern decision criteria articulated.
Your Artifact
TaskForge with: git log showing agent-implemented features with backpressure verification. A written decision tree covering all 5 agent patterns.
Verification
Pre-commit passes. Agent output was reviewed (corrections documented). Decision tree covers all patterns with specific criteria.
If background agents produce output you cannot evaluate → return to Phase 9. Evaluation skill (Chapter 38) is the prerequisite for everything in Phase 10.
TaskForge Checkpoint
TaskForge now has multi-agent-implemented features, automated quality gates, and a decision framework for future agent coordination. The curriculum is complete.
What You Can Now Do
- Design automated feedback loops (harnesses) that let agents self-correct
- Dispatch background agents for parallel development
- Choose the right agent pattern for the right task
- Evaluate and merge multi-agent output
- Articulate when multi-agent coordination is worth the overhead—and when it isn't
The Full Arc
Look at how far you've come. In Phase 1, you couldn't read a line of code. Now you can orchestrate multiple AI agents working in parallel on a structured project with automated quality gates. Every phase was necessary: you can't supervise AI code you can't read (Phase 1), test code you can't write (Phase 2), manage projects without professional tools (Phase 3), deploy without infrastructure knowledge (Phase 4), or direct agents without context engineering skills (Phase 9). The phases aren't detours—they're the prerequisites that make competent AI supervision possible.