What Shield is

Shield is the third leg of Smartflow's request-time enforcement stack. Where access control answers “is this caller allowed to invoke this tool at all?” and compliance answers “does this content violate org policy (PII, jailbreak, prompt injection)?”, Shield answers a different question:

“This call is authorised and clean — but is it destructive?”

It is the second pair of eyes on every tools/call the agent makes and every assistant message the model produces. When a Cursor / Claude Code / Codex agent decides to run DROP DATABASE prod;, git push --force main, or rm -rf /, Shield catches it before the upstream tool ever executes — or before the IDE parses the assistant's plan into another tool call.

Why it exists

An AI coding agent is, in effect, an autonomous user with broad tool access — your database, your terminal, your git remotes, your filesystem, your cloud APIs. Pre-existing controls catch a lot but not this:

Existing layerCatchesMisses
Access control (allowlists)Tools the key isn't allowed to callAllowed-but-destructive calls
Compliance (PII, jailbreak)Sensitive data, prompt-injection patternsPerfectly compliant DROP DATABASE SQL
MAESTRO policyOrg-level allowlists, model selectionOperation semantics within an allowed call

Shield closes that gap. The threat model is not a malicious agent — it's a competent agent operating on slightly stale or partially incorrect context, executing a perfectly valid command that happens to wipe customer data.

The two intercept seams

Shield runs at two points in the request lifecycle. Each fires independently — a request can be cleared by one and caught by the other.

1. MCP tools/call seam

Hooked into MCPProxyHandler::handle_request, immediately after tool-level access control passes and before the compliance scan. The Shield engine sees the full, decoded JSON-RPC request:

{
  "method": "tools/call",
  "params": {
    "name": "execute_sql",
    "arguments": { "query": "DROP DATABASE prod;" }
  }
}

The matcher walks every string in params against each rule's pre-compiled regexes. SQL-aware extractors automatically pull from query/sql/statement keys so rules don't have to know each tool's individual parameter schema.

2. LLM response seam

Hooked into handle_provider_request immediately after MAESTRO response validation. The proxy decompresses the upstream body, extracts the assistant's content string(s) (OpenAI choices[].message.content and Anthropic content[].text shapes are both supported) and runs where: llm_response rules against the text.

Why both? The MCP seam catches what the agent does; the LLM seam catches what the agent plans to do. Many destructive plans show up in the assistant message — “Now I'll run git push --force…” — before the IDE has parsed them into a tool call. Catching the plan beats catching the action.

Severity tiers and outcomes

SeverityOutcomeReturned to callerLatency on hit
CriticalHard blockHTTP 403 with structured shield_blocked errorNone — instant deny
HighApproval queueHTTP 403 with shield_approval_required + ticket_idUntil human approves or denies
MediumAllow with warnPass through; x-shield-warn header set; audit-loggedNone
LowAudit onlyPass through silently; audit-loggedNone

Dogfooding mode (SHIELD_ENFORCE=false)

In Phase-1 deployments the engine demotes every Block / Approval to AllowWithWarn so customers can validate the rule corpus against real traffic without breaking workflows. The audit log still records the original would-have-been decision under details.decision with details.enforce=false, so the dashboard's Shield Activity panel shows what would have happened.

Recommended path: run with SHIELD_ENFORCE=false for one week, tune any false positives by editing shieldset.yaml, then flip to true on a single canary pod and watch for 24-48h before rolling out cluster-wide.

Rule schema — shieldset.yaml

The shieldset is loaded at startup from SHIELD_RULESET_PATH (default /etc/smartflow/shieldset.yaml) or falls back to a built-in default ruleset bundled into the binary. Each rule is compiled once at load time — the hot path is just a tight loop over pre-built regex::Regex objects.

shieldset:
  version: 1

  rules:
    # SQL — Critical
    - id: sql.drop_database
      severity: Critical
      where: tool_call
      match:
        tool: ["execute_sql", "postgres.query", "mysql.query"]
        sql_matches: ['(?i)\bDROP\s+DATABASE\b']
      reason: "DROP DATABASE is never auto-allowed."

    # Git — Critical
    - id: git.force_push_protected
      severity: Critical
      where: tool_call
      match:
        tool: ["run_terminal", "bash", "git"]
        any_param_matches:
          - '\bgit\s+push\s+.*(--force|-f)\s+(origin\s+)?(main|master|prod)\b'
      reason: "Force-push to a protected branch is forbidden."

    # LLM response — Medium
    - id: llm.suggests_force_push
      severity: Medium
      where: llm_response
      match:
        text_matches: ['(?i)git\s+push\s+.*--force\b.*\b(main|master|prod)\b']
      reason: "Assistant plan suggests force-push to a protected branch."

    # Anomaly — sliding-window destructive-verb burst
    - id: anomaly.destructive_burst
      severity: High
      where: tool_call
      anomaly:
        kind: destructive_verb_burst
        window_seconds: 300
        threshold: 5
      reason: "Destructive operation burst detected — pausing for review."

Built-in default ruleset — 15 starter rules

Smartflow Shield ships with 15 starter rules covering the most common destructive patterns. The defaults are embedded into the proxy binary via include_str! so the engine never starts ruleless even if shieldset.yaml is missing.

Looking for the bigger corpus? The standalone aperion-shield binary ships 45 rules across 12 categories — adding secrets, supply-chain, reverse-shells, sudo / privilege, cloud (AWS / GCP / Azure), Kubernetes and Docker — plus an adaptive scoring layer and an identity-gating subsystem. The same YAML schema works in both, so you can author once and run anywhere. See Aperion Shield → Built-in defaults.
CategoryRule IDSeverityWhat it catches
SQLsql.drop_databaseCriticalDROP DATABASE in any SQL tool
sql.drop_table_or_schemaHighDROP TABLE / DROP SCHEMA / TRUNCATE TABLE
sql.unscoped_deleteHighDELETE FROM <t>; with no WHERE clause
sql.unscoped_updateHighUPDATE <t> SET …; with no WHERE clause
sql.grant_or_revoke_allMediumGRANT ALL / REVOKE ALL on schemas or roles
Gitgit.force_push_protectedCriticalgit push --force to main/master/prod
git.history_rewriteHighgit filter-repo, filter-branch, reset --hard HEAD~
git.branch_force_deleteMediumgit branch -D
Filesystemfs.recursive_delete_rootCriticalrm -rf /, rm -rf $HOME, rm -rf ~
fs.dd_to_block_deviceCriticaldd if=… of=/dev/sd*
fs.delete_production_pathHighTool-driven delete of /etc, /var, /usr, /opt
LLM responsellm.suggests_drop_databaseHighAssistant plans DROP DATABASE / TRUNCATE TABLE
llm.suggests_force_pushMediumAssistant plans force-push to a protected branch
llm.suggests_rm_rfMediumAssistant plans rm -rf / or similar
Anomalyanomaly.destructive_burstHigh5+ destructive tool calls per (actor, server, tool) in 5 minutes

Match keys

Regex flavour: Shield uses Rust's regex crate, which does not support lookbehind ((?<!…)) or arbitrary lookahead ((?=…)). Rules using those constructs are rejected at load time with a warning. If you need either, express the constraint with anchors and alternation, or split the rule into multiple narrower rules.

Example 1 — DROP DATABASE via SQL tool (MCP seam)

The agent decides to clean up a stale environment and issues:

{
  "method": "tools/call",
  "params": {
    "name": "execute_sql",
    "arguments": { "query": "DROP DATABASE prod;" }
  }
}

Rule sql.drop_database matches with severity Critical. With enforce on, the proxy returns:

HTTP/1.1 403 Forbidden
x-shield-blocked: block
x-shield-rule: sql.drop_database
x-shield-severity: Critical

{
  "error": {
    "type": "shield_blocked",
    "rule_id": "sql.drop_database",
    "severity": "Critical",
    "reason": "DROP DATABASE is never auto-allowed."
  }
}

Example 2 — git push --force main (MCP seam)

{
  "method": "tools/call",
  "params": {
    "name": "run_terminal",
    "arguments": { "command": "git push origin main --force" }
  }
}

Rule git.force_push_protected matches with severity Critical. The pattern is anchored to the protected branch list (main/master/prod), so a force-push to feature/widgets is fine.

Example 3 — rm -rf / (MCP seam)

{
  "method": "tools/call",
  "params": {
    "name": "bash",
    "arguments": { "command": "rm -rf $HOME" }
  }
}

Rule fs.recursive_delete_root matches with severity Critical. Variants caught by the same rule include: rm -rf /, rm -rf ~, rm -rf $HOME, rm -rf $PWD.

A more targeted variant — deleting /etc/nginx/nginx.conf via a dedicated filesystem.delete_file tool — is caught by fs.delete_production_path (severity High) and queued for approval rather than hard-blocked.

Example 4 — assistant plans destruction (LLM seam)

Even when no tool is called, the assistant's text gets scanned. A response like:

// Assistant says, embedded in chat completion:
“Let me clean this up by running:

  DROP DATABASE customer_archive;
  TRUNCATE TABLE orders;

Then we'll re-import from the backup.”

matches rule llm.suggests_drop_database (severity High). In dogfooding mode the response passes through with these added headers:

x-shield-rule: llm.suggests_drop_database
x-shield-severity: High
x-shield-warn: [shield-shadow] would have approval: Assistant plan contains
               destructive SQL — confirm before executing.

With enforce on, the proxy returns 403 shield_approval_required and the IDE polls until a human approves or denies.

Example 5 — destructive-verb burst (anomaly seam)

Each destructive tool call (anything matching the SQL/git/fs destructive verb set) increments a Redis sorted-set counter keyed by (actor, mcp_server, tool). When the count crosses the rule's threshold, the rule fires — even if no individual call would have on its own.

Default rule: 5 destructive ops in 300 seconds → anomaly.destructive_burst (severity High). Storage uses a Redis ZSET with timestamp scores so the window slides cleanly without cron sweeps. When Redis is unreachable, an in-process fallback keeps single-pod deployments working.

Approval workflow — end to end Paid · Enterprise

Tier: the Redis-backed, dashboard-driven approval queue described below is part of the paid Smartflow product. The free standalone aperion-shield implements the same severity model with a local file-based approval inbox (./.aperion-shield/inbox) — same Critical / High / Medium / Low tiers, no cross-machine queue, no Approve / Deny buttons. See Aperion Shield → Local approval inbox.
  1. Agent issues a request that matches a High-severity rule.
  2. Shield creates a Redis ticket: smartflow:shield:approvals:{ticket_id} (24h TTL) and pushes the id onto the pending zset.
  3. Proxy returns HTTP 403 with shield_approval_required body and the x-shield-ticket header.
  4. The dashboard's Shield Approvals panel renders the pending ticket with Approve / Deny buttons.
  5. The agent's retry loop polls GET /api/shield/approvals/{ticket_id}/status on a tight interval (this endpoint is unauthenticated and ultra-cheap — single Redis GET).
  6. When a human clicks Approve, the ticket transitions to Approved; the next poll returns "approved".
  7. The agent re-issues the original request, which now passes Shield (the rule re-evaluates with the ticket consumed).

Approval queue REST API Paid · Enterprise

MethodPathAuthPurpose
GET/api/shield/approvals?status=pending&limit=NAdminList pending tickets
GET/api/shield/approvals/{ticket_id}AdminFetch a single ticket
GET/api/shield/approvals/{ticket_id}/statusNoneCheap polling for the IDE retry loop
POST/api/shield/approvals/{ticket_id}/approveAdminApprove
POST/api/shield/approvals/{ticket_id}/denyAdminDeny

Admin endpoints require Authorization: Bearer $SF_SHIELD_ADMIN_KEY. Set the SF_SHIELD_ADMIN_KEY env var on api-server in production. Leave it unset in dev for open access.

IDE-side error contract

The agent's retry path should treat the headers as the source of truth — no body parsing required:

HeaderMeaning
x-shield-blocked: blockHard block. Surface the error to the user; do not retry.
x-shield-blocked: approval_requiredQueued. Read x-shield-ticket; poll /api/shield/approvals/{ticket_id}/status until "approved" or "denied"; on approval, re-issue.
x-shield-warn: <banner>Non-blocking warning. Show the banner; the response is still complete.
x-shield-rule: <rule_id>Always present when Shield matched. Useful for telemetry and per-rule troubleshooting.
x-shield-severity: Critical|High|Medium|LowTier the rule fired at.

Sample retry loop (Python)

import time, requests

def call_with_shield_retry(url, json, max_wait_seconds=600):
    r = requests.post(url, json=json)
    if r.status_code != 403 or r.headers.get("x-shield-blocked") != "approval_required":
        return r

    ticket = r.headers["x-shield-ticket"]
    deadline = time.time() + max_wait_seconds
    while time.time() < deadline:
        s = requests.get(f"https://your-proxy/api/shield/approvals/{ticket}/status").json()
        if s["status"] == "approved":
            return requests.post(url, json=json)   # re-issue
        if s["status"] == "denied":
            raise RuntimeError(f"Shield denied {ticket}")
        time.sleep(2)
    raise TimeoutError(f"Shield ticket {ticket} not resolved in {max_wait_seconds}s")

Configuration reference (env vars)

VariableDefaultPurpose
SHIELD_RULESET_PATH/etc/smartflow/shieldset.yamlPath to the YAML rule file
SHIELD_ENFORCEfalseWhen false, demote Block / Approval to logged-only AllowWithWarn (Phase-1 dogfooding)
SF_SHIELD_ADMIN_KEYunsetBearer token for the approval-queue admin endpoints
REDIS_URLunsetRequired for cross-pod approval queue and anomaly counters; in-process fallback when missing

Audit log integration

Every actionable decision (Warn / Approval / Block) is appended to the existing tamper-evident audit chain with event_type: Custom("shield"). The entry's details map carries:

Query directly:

GET /api/audit/logs?event_type=shield&limit=100

Spend report dimensions Paid · Enterprise

Tier: Spend tracking is part of the paid Smartflow proxy and requires the Smartflow billing/usage subsystem. The free standalone aperion-shield does not proxy LLM calls and therefore has nothing to bill — there is no equivalent on the OSS side.

For finance/compliance “what did Shield catch this month and at what cost” reports, three dimensions are exposed on /spend/report:

Dashboard panels Paid · Enterprise

Tier: the web dashboard is part of the paid Smartflow product. The free standalone aperion-shield has no web UI — every decision is logged to stderr (Cursor / Claude Code surface that in their tool-call panel) and High-severity approvals are handled via the local file inbox at ./.aperion-shield/inbox. To get the dashboard panels below across a fleet of standalone shields, enroll them in Shield Org Mode.

The admin dashboard exposes Shield at /dashboard/shield.html:

  1. Stats banner — counts of Block, Approval, Warn, Audit-only and Shadow events in the last query window.
  2. Shield Activity — real-time feed of every actionable decision with rule, severity, surface, actor, and reason. Filterable by severity / decision / rule substring; auto-refreshes every 30 seconds.
  3. Pending Approvals — High-severity tickets with Approve / Deny buttons. Resolution propagates to the IDE polling loop within seconds.

Org Mode adds three more pages — shield_fleet.html, shield_policy.html, shield_settings.html — for managing enrolled standalone shields. See Shield Org Mode → Dashboard pages.

Rollout playbook

  1. Deploy v1.7.37+ with SHIELD_ENFORCE=false. The default ruleset loads automatically.
  2. Watch the Shield Activity panel for ≥24 hours. Every match is logged with the would-have-been decision.
  3. Tune any false positives by editing shieldset.yaml: drop the severity, restrict the tool: list, or remove the rule. Pod restart picks up changes; ShieldEngine::replace_ruleset supports in-process hot-reload for advanced setups.
  4. Flip SHIELD_ENFORCE=true on a single canary pod. Watch the same panel for 24-48h.
  5. Roll out cluster-wide. The approval queue + IDE polling now provides the human-in-the-loop step for High-severity ops.

Fail-open semantics

Shield is fail-open by design. Any internal error inside Shield (regex panic, anomaly tracker failure, missing ruleset, Redis outage) returns Allow and logs a warn!. The proxy will never 5xx because Shield broke; the trade-off is that Shield can't be a load-bearing authorisation layer — keep MCP access control in front of it.

Concretely, the engine will fall back rather than fail in these scenarios:

Shield Org Mode — fleet-managed standalone shields

Shield Org Mode extends the Smartflow control plane to manage every aperion-shield running on a developer laptop, CI runner, or production box across your organisation. The proxy-resident shield documented above and Org Mode are complementary: the proxy enforces at the gateway, Org Mode enforces at the developer's machine before the call ever leaves it.

An enrolled standalone shield pulls its policy from Smartflow, ships every actionable decision back as an audit event, delegates identity verification through the central dashboard, and respects a fleet-wide killswitch. Standalone mode (no enrollment) still works exactly the same; Org Mode is purely additive.

Control-plane API

All endpoints live under /api/enterprise/shield/*. Device-scoped routes auth with the HMAC vkey issued at enrollment; admin-scoped routes auth with Authorization: Bearer $SF_SHIELD_ADMIN_KEY.

MethodPathAuthPurpose
GET/api/enterprise/shield/infovkeyProvider catalog & readiness, killswitch state
GET/api/enterprise/shield/shieldset/{group}vkeyPull the active shieldset YAML + hash + version
GET/api/enterprise/shield/shieldset/{group}/versionvkeyCheap version probe (poll target, no body)
PUT/api/enterprise/shield/shieldset/{group}adminPublish a new shieldset version
POST/api/enterprise/shield/eventsvkeyBatched audit-event ingest
GET/api/enterprise/shield/eventsadminRead the audit timeline (with filters)
POST/api/enterprise/shield/identity/checkvkeyCache check for a valid proof on (subject, scope)
POST/api/enterprise/shield/identity/beginvkeyStart an identity flow; returns a verify_url + challenge id
GET/api/enterprise/shield/identity/result/{id}vkeyPoll for the signed proof when the user finishes
GET / POST/api/enterprise/shield/killswitchadmin (POST)Get / flip the fleet killswitch

Enrollment, heartbeats, vkey issuance and fleet listing reuse the existing device API at /api/enterprise/devices/* — same primitives as the SDK device fleet.

Dashboard pages

Three new pages render the Org Mode control plane. All ship with Smartflow v1.7.38+ and use the existing dark theme.

PagePathWhat it does
Shield Fleet/dashboard/shield_fleet.htmlList enrolled devices (active / stale / revoked), issue one-time enrollment tokens, flip the fleet killswitch.
Shield Policy/dashboard/shield_policy.htmlYAML editor for the active shieldset.yaml per policy group. Version + content hash displayed; saves trigger a version bump that enrolled devices pick up on the next 30 s poll.
Shield Settings/dashboard/shield_settings.htmlIdentity-provider catalog + readiness, env-var setup hints, killswitch toggle, live filterable timeline of every Org Mode audit event.

Lifecycle on the developer's machine

# 1) Admin issues a one-time enrollment token from /dashboard/shield_fleet.html

# 2) Developer enrolls on their laptop:
aperion-shield --enroll \
  --smartflow-url https://smartflow.example.com \
  --token enroll_o7s9...3jk \
  --device-name "alice-laptop"

# 3) Subsequent runs pull policy, send audit, delegate identity verification:
aperion-shield -- npx -y @modelcontextprotocol/server-postgres "$PG_URL"

# 4) Disenroll when the laptop is decommissioned:
aperion-shield --disenroll --revoke   # also revokes the vkey server-side
Same YAML schema, two enforcement points. Author a shieldset once. The proxy enforces it at /v1/messages / /anthropic/v1/messages / /cursor/* / /api/mcp/*; enrolled standalones enforce it at tools/call on every dev machine. Audit lands in the same chain.