ADR-001: Multi-Agent Safety Verification Pipeline
Status
Accepted
Context
ZenCursor needs to verify commands before execution to prevent disasters like the 2026-01-23 incident where an rsync command with a typo wiped production data. A single-point verification system could miss edge cases or be bypassed.
We need a robust safety system that:
- Catches obvious dangerous patterns quickly (rm -rf /, etc.)
- Understands context and intent
- Provides alternative suggestions
- Has defense in depth through multiple verification stages
Decision
Implement a multi-agent pipeline for command verification:
Command Input
│
▼
┌─────────────┐
│ Safety Agent│ ◄─── Local pattern matching (free, instant)
│ (Local) │ Catches: rm -rf, format, dd, etc.
└─────┬───────┘
│ If risky but not critical
▼
┌─────────────┐
│ Coder Agent │ ◄─── Haiku (cheap, fast)
│ (Haiku) │ Proposes safer alternatives
└─────┬───────┘
│
▼
┌─────────────┐
│Reviewer Agent│ ◄── Sonnet (thorough)
│ (Sonnet) │ Final security review
└─────┬───────┘
│
▼
Decision
Agent Responsibilities
Safety Agent (Local)
- Pattern matching against known dangerous commands
- Zero cost, instant response
- Catches 90%+ of obvious threats
- Critical threats fail fast (no further processing)
Coder Agent (Haiku)
- Only invoked for risky but non-critical commands
- Proposes safer alternatives
- Explains why original is risky
- Cost: ~$0.001 per check
Reviewer Agent (Sonnet)
- Final security review
- Considers context and intent
- Makes approval/denial decision
- Cost: ~$0.01 per check
Consensus Requirement
By default, all agents must agree for a command to be approved. This can be relaxed for specific use cases.
Consequences
Positive
- Defense in depth - multiple verification layers
- Cost-effective - cheap/free agents filter most traffic
- Flexible - can adjust thresholds per environment
- Auditable - each agent's decision is logged
- Extensible - can add more agents (e.g., domain-specific)
Negative
- Latency - full pipeline takes 2-5 seconds
- Complexity - more moving parts
- Cost - Sonnet calls add up for heavy users
- False positives - conservative system may block legitimate commands
Neutral
- Requires API keys for Haiku/Sonnet
- Local-only mode degrades to pattern matching only
Alternatives Considered
Alternative 1: Single LLM Verification
Use one powerful model (Opus) for all verification.
Rejected because:
- Too expensive for every command
- Single point of failure
- Overkill for obvious patterns
Alternative 2: Rule-Based Only
Use only pattern matching without LLM involvement.
Rejected because:
- Cannot understand context/intent
- Cannot suggest alternatives
- Misses novel attack patterns
- Too many false positives/negatives
Alternative 3: User-Only Confirmation
Just ask user to confirm dangerous commands.
Rejected because:
- Users click through confirmations
- Doesn't prevent social engineering
- No learning or adaptation