The decision pipeline every tool call passes through, the four outcomes that can result, layered defenses against prompt injection and exfiltration, harness-level governance for manifest drift and sub-agent delegation, and the YAML policy DSL that drives all of it. One technical reference for architects and security engineers.
Every tool call (outbound from your agent, or inbound from an external caller) passes through the same five-stage pipeline before anything executes. Each stage is observable; the full sequence is below.
Every tool call receives one of four outcomes, determined by policy, not inferred from silence. Each decision is traceable and token-bound where applicable; ambiguous cases are denied by default.
query_db on a read-only replica within business hours, matching analytics-readonly. Token issued, query runs.write_file with path ../../etc/passwd. Path-traversal inspector fires. Request denied; event hash-chained.send_email to an external domain outside business hours. Ops reviews via webhook. Expires if not acted on within TTL.The write path is hardened beyond rule evaluation. Sessions carry a measurable trust posture, untrusted tool output is spotlit before it reaches the model, and exfil channels at the egress boundary are stripped or rate-capped per session.
<img> tags are stripped or neutralized on egress, with per-domain allowlists. The classic "summarize a doc, then leak it via an image URL" pattern is blocked by default.Tool-call filtering catches a single bad action. Harness governance catches the broader pattern: an agent that claimed one job at session start and is now doing something else. Manifest declared, drift detected, sub-agent contracts enforced, workspace mutations recorded for causal diagnosis.
shell.exec doesn't get to log "anomaly" — it gets DENY, with a failure pattern attached for the diagnose API.The DSL matches on subject attributes, action shape, resource patterns, runtime risk signals, and context. Hover any clause in the rule below to see what it does.
# policies/payments-write.yaml version: 2 rules: - id: PAY-WRITE-001 priority: 10 match: capability: github.write tool: "github.push" resource: "acme-corp/payments:main" principal: trust_level: [high, medium] risk: secrets: deny_on_match prompt_injection: { score_gte: 0.7, decision: deny } decision: require_approval obligations: - approver_groups: [ciso, devops-lead] - approval_ttl_seconds: 900 - webhook: "https://ops.acme.com/approvals" audit: tags: ["sox-relevant", "prod-write"]
The interactive demo walks the same pipeline against a real agent in 13 minutes. If you'd rather have an architect on the call, we'll do an architecture review against your actual stack.