ADR-001: Multi-Agent Safety Verification Pipeline

Last updated 25 Jan 2026, 15:10

Status

Accepted

Context

ZenCursor needs to verify commands before execution to prevent disasters like the 2026-01-23 incident where an rsync command with a typo wiped production data. A single-point verification system could miss edge cases or be bypassed.

We need a robust safety system that:

Catches obvious dangerous patterns quickly (rm -rf /, etc.)
Understands context and intent
Provides alternative suggestions
Has defense in depth through multiple verification stages

Decision

Implement a multi-agent pipeline for command verification:

Command Input
     │
     ▼
┌─────────────┐
│ Safety Agent│ ◄─── Local pattern matching (free, instant)
│   (Local)   │      Catches: rm -rf, format, dd, etc.
└─────┬───────┘
      │ If risky but not critical
      ▼
┌─────────────┐
│ Coder Agent │ ◄─── Haiku (cheap, fast)
│   (Haiku)   │      Proposes safer alternatives
└─────┬───────┘
      │
      ▼
┌─────────────┐
│Reviewer Agent│ ◄── Sonnet (thorough)
│  (Sonnet)   │      Final security review
└─────┬───────┘
      │
      ▼
   Decision

Agent Responsibilities

Safety Agent (Local)
- Pattern matching against known dangerous commands
- Zero cost, instant response
- Catches 90%+ of obvious threats
- Critical threats fail fast (no further processing)
Coder Agent (Haiku)
- Only invoked for risky but non-critical commands
- Proposes safer alternatives
- Explains why original is risky
- Cost: ~$0.001 per check
Reviewer Agent (Sonnet)
- Final security review
- Considers context and intent
- Makes approval/denial decision
- Cost: ~$0.01 per check

Consensus Requirement

By default, all agents must agree for a command to be approved. This can be relaxed for specific use cases.

Consequences

Positive

Defense in depth - multiple verification layers
Cost-effective - cheap/free agents filter most traffic
Flexible - can adjust thresholds per environment
Auditable - each agent's decision is logged
Extensible - can add more agents (e.g., domain-specific)

Negative

Latency - full pipeline takes 2-5 seconds
Complexity - more moving parts
Cost - Sonnet calls add up for heavy users
False positives - conservative system may block legitimate commands

Neutral

Requires API keys for Haiku/Sonnet
Local-only mode degrades to pattern matching only

Alternatives Considered

Alternative 1: Single LLM Verification

Use one powerful model (Opus) for all verification.

Rejected because:

Too expensive for every command
Single point of failure
Overkill for obvious patterns

Alternative 2: Rule-Based Only

Use only pattern matching without LLM involvement.

Rejected because:

Cannot understand context/intent
Cannot suggest alternatives
Misses novel attack patterns
Too many false positives/negatives

Alternative 3: User-Only Confirmation

Just ask user to confirm dangerous commands.

Rejected because:

Users click through confirmations
Doesn't prevent social engineering
No learning or adaptation

Status

Context

Decision

Agent Responsibilities

Consensus Requirement

Consequences

Positive

Negative

Neutral

Alternatives Considered

Alternative 1: Single LLM Verification

Alternative 2: Rule-Based Only

Alternative 3: User-Only Confirmation

References