Track — Architecture · How enforcement works end to end

The decision pipeline

Five stages. No default-allow. No post-hoc recovery.

Every tool call (outbound from your agent, or inbound from an external caller) passes through the same five-stage pipeline before anything executes. Each stage is observable; the full sequence is below.

01 · INTERCEPT

Intercept

Every tool call enters via a typed adapter and is normalized into a unified action model before evaluation.

HTTP · MCP · GitHub · Shell · Slack
5 protocol adapters

02 · EVALUATE

Evaluate

Risk inspectors run in parallel: PII, secrets, prompt/SQL/command injection, output inspection, vector-store injection, manifest drift, contextual PII. Encoding evasion is decoded and re-inspected.

Subject · Action · Resource · Context · Manifest

03 · ENFORCE

Enforce

One of four explicit decisions: allow, deny, require human approval, or allow under enforced obligations.

Webhook approvals · TTLs · rate caps

04 · BIND

Bind

Approval is cryptographically bound to the exact action hash. Single-byte changes between approval and execution invalidate the token. No TOCTOU window.

Ed25519 · SHA-256 · 60s TTL

05 · TRACE

Trace

Every decision is recorded in a hash-chained audit log. Chain tips are anchored to RFC 3161 TSAs and Sigstore Rekor — bounded rollback distance, independently verifiable.

SHA-256 chain · RFC 3161 · Sigstore Rekor

→ Outbound · agent initiates a tool call

Tool Call

Agent attempts action

Normalize

Unified action model

Inspect

Parallel inspectors

Policy Engine

YAML policy rules

Bind

Ed25519 one-time token

Trace

Hash-chained audit trail

← Inbound · external system calls into a governed agent

Admit

Session token or reject

Policy Engine

Inbound rules

Inspect

Payload + injection

Delegation

RFC 8693 chain

Authenticate

JWKS / OIDC / API key

External Call

Arrives at node

Enforcement outcomes

Four explicit decisions. No default-allow.

Every tool call receives one of four outcomes, determined by policy, not inferred from silence. Each decision is traceable and token-bound where applicable; ambiguous cases are denied by default.

Allow

Passed all checks. Execution token issued. Action proceeds.

Example. Agent calls query_db on a read-only replica within business hours, matching analytics-readonly. Token issued, query runs.

Routine read paths should not require approval friction.

Deny

Blocked before execution. Rejection reason and inspector findings recorded.

Example. Agent calls write_file with path ../../etc/passwd. Path-traversal inspector fires. Request denied; event hash-chained.

Hard denials are non-bypassable and non-silenceable.

Require Approval

Execution paused. Human sign-off requested via webhook. Token issued only on explicit approval.

Example. Agent attempts send_email to an external domain outside business hours. Ops reviews via webhook. Expires if not acted on within TTL.

High-blast-radius actions stay gated on a human, at machine speed.

Allow with Obligations

Permitted under enforced constraints: response masking, sandbox, rate cap, session quarantine.

Example. Agent queries user records; PII fields auto-redacted before response. The obligation is token-bound, so post-execution tampering triggers an alert.

Necessary actions ship with structural guardrails, not convention.

Layered defenses against prompt injection & exfiltration

Trust the session. Scrub the egress.

The write path is hardened beyond rule evaluation. Sessions carry a measurable trust posture, untrusted tool output is spotlit before it reaches the model, and exfil channels at the egress boundary are stripped or rate-capped per session.

01 · SESSION TRUST

Two-axis trust on every session

Sessions are scoped (allowed_tools, resource_prefixes) and scored on auth_type × credential_strength × trust_level. A flagged session forces write/execute/delete/export to require_approval; ending it eagerly revokes every linked execution token.

SessionTokenmTLS · OIDC SSOEager revocation

02 · SPOTLIGHT

Trusted instructions vs. untrusted tool data

Tool output is wrapped via datamark, encode, or delimit modes per Microsoft Spotlighting (arXiv 2403.14720) before reaching the model, so an injected instruction in a fetched page is read as data, not as a command.

datamarkencodedelimit

03 · EGRESS SCRUB

Block EchoLeak-class exfil at the boundary

Markdown image beacons and HTML <img> tags are stripped or neutralized on egress, with per-domain allowlists. The classic "summarize a doc, then leak it via an image URL" pattern is blocked by default.

strip_markdown_imagessanitize_html_egressexcept_domains

04 · EXFIL RATE LIMIT

Per-session bytes-out and resource caps

Configurable caps on bytes-out and unique resources per session. Breaching a cap auto-quarantines the entire session, not just the offending request, so a runaway agent stops at the first hop.

Auto-quarantine

05 · OUTPUT INSPECTION

Inspect tool output before it reaches the model

The output inspector screens tool responses for embedded instructions, secrets, and exfil markers before they re-enter the model's context. Pairs with vector injection detection for retrieval-augmented attacks where the poison sits in the index.

Output inspectionVector injectionContextual PII

06 · CANARY & RISK SCORE

Risk-tiered escalation with canary detection

A unified risk score (0–1) resolves to CRITICAL / HIGH / MEDIUM / LOW tiers. CRITICAL quarantines the session; HIGH escalates write/delete/export to approval. The canary API plants beacons that fire only if data leaves the boundary.

Risk tiersCanary APIAuto-escalation

Harness governance

Bind the agent to its declared scope. Catch drift in flight, not in the post-mortem.

Tool-call filtering catches a single bad action. Harness governance catches the broader pattern: an agent that claimed one job at session start and is now doing something else. Manifest declared, drift detected, sub-agent contracts enforced, workspace mutations recorded for causal diagnosis.

01 · MANIFEST

Sessions declare scope at start

Every governed session opens with a declared manifest: role, allowed tools, resource prefixes, budget. Becomes the structural ground truth that policy and inspectors reason against.

Session manifestRole-bound

02 · DRIFT

Drift inspector blocks scope violations

The drift inspector compares each action against the active manifest. A code-reviewer that suddenly calls shell.exec doesn't get to log "anomaly" — it gets DENY, with a failure pattern attached for the diagnose API.

Drift inspectorFailure patterns

03 · SUB-AGENT CONTRACTS

Parent declares the child's budget

When a parent agent spawns a child, it publishes a delegation contract: scope, tool budget, isolation flags. The runtime denies any child action that exceeds it — privilege creep across parent → child becomes a structural impossibility.

Delegation contractBudget-bound

04 · WORKSPACE STATE

Every write/delete is a first-class event

Workspace mutations emit a trace event with the artifact, the role that made the change, and the failure pattern (if any). The diagnose API reconstructs the causal chain on demand.

workspace_stateDiagnose APICausal chain

The policy DSL

Policy as YAML. Reviewable, signable, version-controlled.

The DSL matches on subject attributes, action shape, resource patterns, runtime risk signals, and context. Hover any clause in the rule below to see what it does.

# policies/payments-write.yaml
version: 2
rules:
  - id: PAY-WRITE-001
    priority: 10
    match:
      capability: github.write
      tool: "github.push"
      resource: "acme-corp/payments:main"
      principal:
        trust_level: [high, medium]
    risk:
      secrets: deny_on_match
      prompt_injection: { score_gte: 0.7, decision: deny }
    decision: require_approval
    obligations:
      - approver_groups: [ciso, devops-lead]
      - approval_ttl_seconds: 900
      - webhook: "https://ops.acme.com/approvals"
    audit:
      tags: ["sox-relevant", "prod-write"]

id, priority

Stable identifiers for policy changes. Higher priority overrides lower-priority rules that also match.

capability, tool, resource

Match on the capability class the agent invokes (a coarse-grained permission), the specific tool, and the resource. Resource patterns support wildcards.

principal.trust_level

Caller's runtime trust posture, computed from auth method, credential strength, and session signals. Rules can require a minimum trust band before allowing risky operations.

risk signals

Inspector findings on the payload: detected secrets, prompt-injection score, PII classes, encoding evasion. Each signal can short-circuit to deny or escalate the decision.

decision

One of allow, deny, require_approval, or allow_with_obligations. Default is deny by configuration if no rule matches.

obligations

Constraints carried into execution: approver groups for human review, webhook for notification, TTL for the approval window, response-masking rules for PII, sandbox flags. The execution token is bound to these.

How Track enforces, end to end.