📄 Spec Document como Input
No nivel avancado, o input nao e mais um plan.md com 5 tarefas. E um spec document completo que descreve uma aplicacao inteira. O orchestrator le a spec, divide em rounds, e despacha workers para construir tudo.
"We send in our initial spec to the orchestrator and it has to reason about that and then figure out how many workers to spin off."
📋 Exemplo de Spec (Kanban Board)
# Spec: Single Page Kanban Board
## Objetivo
Criar um Kanban board como web app estatico (HTML + CSS + JS).
## Requisitos
- 3 colunas: To Do, In Progress, Done
- Drag and drop entre colunas
- Persistencia via localStorage
- Responsivo (mobile-first)
- Tema dark com cores amber/neutral
## Criterios de Aceitacao
- App carrega sem erros no browser
- Drag and drop funcional
- Dados persistem apos reload
- Layout correto em mobile (320px+)
Cole Medin demonstrou exatamente esse cenario: "building a single page Kanban board as a static web app, I just take this prompt and I'll show you it running live right now." O orchestrator recebeu a spec e dividiu o trabalho em rounds automaticamente.
🎯 Orchestrator Planeja Rounds
O orchestrator analisa a spec e decide a estrategia: quantos rounds, quantos workers por round, e quais tarefas cada worker recebe. Essa e a decisao mais critica do sistema, e por isso requer um modelo forte.
"The orchestrator is deciding how to split up the work right now. The orchestrator spent 6,000 tokens with that initial planning and then prompting our first three workers in round number one."
📋 Prompt do Orchestrator Multi-Agent
You are the orchestrator for a multi-agent build system.
Input: spec.md (attached)
State store: Postgres (Neon) at DATABASE_URL
Your job:
1. Read spec.md and break it into independent work units
2. Plan rounds: group units that can run in parallel
3. For each round, dispatch workers with specific prompts
4. After each round, read worker results from state store
5. Validate: does the output match spec requirements?
6. If validation passes, plan next round. If fails, retry or escalate.
7. When all spec requirements are met, report completion.
Pause for human approval before starting each new round.
⚙ Workers Executam e Reportam
Cada worker recebe uma tarefa especifica do orchestrator e opera em sessao propria com worktree e database branch isolados. Ao concluir, grava os resultados no state store (Postgres), nao no context window.
"Those workers are going to go back and they're going to update the state that we have in our database. And so this is our loop, right? Because then the next time the orchestrator runs, it's going to get that updated state from the workers."
📋 Prompt Template para Worker
You are Worker {{worker_id}} in Round {{round_number}}.
Your task: {{task_description}}
Spec context: {{relevant_spec_section}}
Previous round results: {{previous_results_summary}}
Instructions:
1. Execute the task in your worktree
2. Run validation: {{validation_command}}
3. Report results to state store:
- Status: success/failure
- Files changed: list
- Test results: pass/fail count
- Notes: any issues encountered
Do NOT modify files outside your assigned scope.
✅ Orchestrator Valida e Lanca Proximo Round
Apos cada round, o orchestrator le os resultados do state store, valida se os criterios da spec foram atendidos, e decide se lanca outro round ou declara conclusao. Esse e o ponto de controle que diferencia um sistema robusto de um loop cego.
"It'll go round by round, doing validation each time. And we can even have human in the loop so that we get to actually take a look at what has happened in the first round before the orchestrator moves on to the next."
🔧 Logica de Validacao entre Rounds
After Round {{N}} completes:
1. Query state store for all worker results in this round
2. Check: did all workers report success?
- If any failed: retry failed workers (max 2 retries)
- If still failing: pause and escalate to human
3. Check: do combined results satisfy spec requirements?
- Run integration tests across merged worktrees
- Compare output against spec acceptance criteria
4. If all good: plan Round {{N+1}} with remaining work
5. If spec is fully satisfied: merge all branches, report done
🛑 Human-in-the-Loop Checkpoints
O human-in-the-loop (HITL) e o que separa um sistema confiavel de um experimento. Sem HITL, voce corre o risco de deixar o loop rodar por horas e voltar para descobrir que o resultado e inutilizavel.
"A lot of times people set up these systems to just go, go, go, go. And then you have a run for a day. And by the time it comes back, you just have crap."
📋 Onde Colocar Checkpoints
Apos o planejamento
Revise os rounds planejados pelo orchestrator antes de ele comecar a despachar workers.
Entre rounds
Valide os resultados do round anterior antes de autorizar o proximo.
Antes do merge final
Revise todas as mudancas antes de mergear na branch principal.
🎯 Ponto de Reflexao
Cole Medin resume a filosofia: "That is the kind of reliability that I feel like we really need to have right now in order to build anything more than simple demos." HITL nao e opcional em sistemas de producao. E o que transforma uma demo em algo usavel.
❓ Questionamentos
O orchestrator multi-agent e o pico da complexidade. Questoes que surgem na pratica real.
O custo e justificavel?
Cole reporta mais de 1M tokens para uma unica aplicacao simples. O multi-agent multiplica isso. A resposta honesta: depende do valor do output vs. custo do developer humano. Para PoCs, pode valer. Para producao, calcule antes.
Quando a complexidade vira inimiga
Nem todo projeto precisa de orchestrator multi-agent. Cole e claro: "I would just fold loop engineering into harness engineering." Se um plan.md simples resolve, nao escale para multi-agent so porque parece mais sofisticado.
Modelo do orchestrator vs. modelo dos workers
Cole usa Pi/Kimi para o orchestrator (decisoes de roteamento) e Claude para workers (implementacao real). O orchestrator nao precisa do modelo mais caro, mas precisa de raciocinio confiavel. Teste antes de definir.
🎯 Ponto de Reflexao
O multi-agent e uma ferramenta poderosa, mas nao e uma solucao universal. Use quando o problema realmente exige paralelismo, rounds e validacao entre etapas. Para a maioria dos cenarios, o issue bot (4.2) ou ate o task runner (4.1) sao suficientes.
📋 Resumo do Modulo
Proximo Modulo:
4.4 -- Anti-Patterns e Troubleshooting (o que da errado e como consertar)