MÓDULO 4.4

🔎 Anti-alucinação + Code Review

Investigate before answering, coverage (não "be conservative") e um harness de code review pronto para colar no próximo PR.

7
Tópicos
45
Minutos
Expert
Nível
Qualidade
Tipo
OLHO + CÓDIGO ABERTO — READ BEFORE ANSWERING INVESTIGATE function fetchUser(id) { const r = await db.query return r.rows[0]; } read → then answer
1

👁️ Snippet <investigate_before_answering>

Bloco canônico que força Claude a ler antes de afirmar. Reduz drasticamente alucinações sobre código não lido.

<investigate_before_answering>
Before you claim anything specific about the code in this
repository, actually read the relevant files with the Read
tool. Do not guess, do not reconstruct from memory, do not
rely on your training data about similar libraries.

If a user asks "what does function X do?" and you haven't
read the file that defines X in this session, read it first.
If a user asks "where is Y used?", run Grep first.

When you are uncertain about a fact — a file path, a function
signature, a type definition, a config value — say so
explicitly, investigate with a tool call, and then answer.
Never fabricate identifiers, imports, or paths.

If, after investigating, you still don't have enough
information to answer confidently, say "I need to look at
[specific file / specific tool output] to answer this
correctly" and then do it. Do not produce a plausible-sounding
answer that could be wrong.
</investigate_before_answering>
2

📝 Nunca especular sobre código não lido

Palpite = alucinação confiante. É a fonte #1 de bugs de "Claude citou uma função que não existe" ou "Claude importou módulo que não está instalado".

❌ Resposta alucinada

"Você pode usar db.findOne({id})" — mas essa API não existe no seu ORM; é Prisma, não Mongoose.

✓ Resposta investigada

src/db.ts primeiro, identifica "usa Prisma client", então propõe prisma.user.findUnique({where: {id}}).

3

🔍 Code review com COVERAGE — não "be conservative"

Mudança de framing crucial. "Be conservative" suprime findings legítimos por medo de falso positivo. Coverage mode pede todos os findings; filtro vem depois.

❌ Framing antigo

Be conservative.
Only flag issues you're
highly confident about.
Avoid false positives.

→ Claude auto-censura, esconde bugs reais.

✓ Framing moderno

Report every issue you
find. Add confidence
(0.0–1.0) and severity
(low/medium/high). We
filter downstream.

→ Claude reporta tudo; pipeline ranqueia.

4

📢 "Report every finding" + confidence + severity

Prompt canônico para code review. Separe geração (coverage) de ranking (filtro externo).

<code_review_coverage>
Report every issue you find in this diff. Do not self-censor.
Do not merge similar issues. Do not skip findings because you
think they're minor or because you're unsure.

For each finding, output:

{
  "file": "path/to/file.ts",
  "line": 42,
  "severity": "low" | "medium" | "high",
  "confidence": 0.0 to 1.0,
  "category": "bug" | "perf" | "security" | "style" | "maintainability",
  "description": "one-sentence summary",
  "detail": "1–3 sentence explanation with rationale",
  "suggestion": "concrete fix or 'needs discussion'"
}

Include issues you're only 30% confident about — we filter
downstream. It is better to include a false positive at
confidence 0.3 than to miss a real bug.

If a finding depends on context you don't have, say so in
"detail" and set confidence ≤ 0.5. Do not guess.

At the end of your output, produce a summary line:
"Total findings: N (high: X, medium: Y, low: Z)"
</code_review_coverage>
5

🎯 Downstream filter — coverage × ranking

Arquitetura em 2 passos. Mesmo modelo, 2 papéis distintos.

# Passo 1: Coverage (acha tudo)
findings = claude.review(diff, prompt=coverage_prompt)

# Passo 2: Filter + rank (separa sinal)
ranked = claude.filter(findings, prompt="""
  Group duplicates. Re-rank by severity × confidence.
  Drop findings with confidence < 0.4 AND severity = low.
  Return top 15 only. Keep full JSON schema.
""")

# Passo 3: Post no PR
post_to_pr(ranked)
Passo 1
Coverage máximo
Passo 2
Filtro + rank
Passo 3
Post no PR
6

🧪 Benchmark: 4.7 acha +11pp mais bugs que 4.6

Dado oficial da Anthropic: em code review benchmark interno, 4.7 em coverage mode encontra 11 pontos percentuais a mais de bugs reais que 4.6, mantendo mesmo ou menor false positive rate.

SetupRecall real bugsPrecision
4.6 + "be conservative"62%81%
4.7 + "be conservative"68%82%
4.7 + coverage + filter73%85%

Insight: o ganho vem tanto do modelo quanto do framing. Coverage + filter bate conservative mode no 4.6 e até mesmo no 4.7 sem filter.

7

📋 Harness de review pronto

Copy-paste. Adapte só bordas do repo (lint rules, linguagem-alvo).

# prompts/pr-review.md

<investigate_before_answering>
... (bloco do tópico 1) ...
</investigate_before_answering>

<code_review_coverage>
... (bloco do tópico 4) ...
</code_review_coverage>

<scope>
Review ONLY the files in this diff. Do not flag pre-existing
issues in unchanged code. Do not refactor. Do not propose new
abstractions — only flag them if the diff is adding bad ones.
</scope>

<repo_context>
Language: TypeScript strict mode.
Framework: Next.js 15 / React 19.
Tests: Vitest.
Lint: ESLint with airbnb + custom rules in .eslintrc.
Style: prefer functional components, named exports, no default
exports in src/lib/*.
</repo_context>

PR diff follows. Analyze and output JSON findings array.
investigate
Não alucinar
coverage
Achar tudo
scope + repo
Sem dispersão

📋 Resumo

investigate_before_answering — anti-alucinação padrão
Coverage, não conservative — 4.7 acha +11pp mais bugs
Confidence + severity — schema de finding
Filter downstream — gera e rankeia em calls separadas
Harness pronto — 4 blocos que você adapta só nas bordas

Próximo Módulo:

4.5 — O Default Terracota