Shield — AI Coding Agent Guardrails

What Shield is

Shield is the third leg of Smartflow's request-time enforcement stack. Where access control answers “is this caller allowed to invoke this tool at all?” and compliance answers “does this content violate org policy (PII, jailbreak, prompt injection)?”, Shield answers a different question:

“This call is authorised and clean — but is it destructive?”

It is the second pair of eyes on every tools/call the agent makes and every assistant message the model produces. When a Cursor / Claude Code / Codex agent decides to run DROP DATABASE prod;, git push --force main, or rm -rf /, Shield catches it before the upstream tool ever executes — or before the IDE parses the assistant's plan into another tool call.

Why it exists

An AI coding agent is, in effect, an autonomous user with broad tool access — your database, your terminal, your git remotes, your filesystem, your cloud APIs. Pre-existing controls catch a lot but not this:

Existing layer	Catches	Misses
Access control (allowlists)	Tools the key isn't allowed to call	Allowed-but-destructive calls
Compliance (PII, jailbreak)	Sensitive data, prompt-injection patterns	Perfectly compliant `DROP DATABASE` SQL
MAESTRO policy	Org-level allowlists, model selection	Operation semantics within an allowed call

Shield closes that gap. The threat model is not a malicious agent — it's a competent agent operating on slightly stale or partially incorrect context, executing a perfectly valid command that happens to wipe customer data.

The two intercept seams

Shield runs at two points in the request lifecycle. Each fires independently — a request can be cleared by one and caught by the other.

1. MCP `tools/call` seam

Hooked into MCPProxyHandler::handle_request, immediately after tool-level access control passes and before the compliance scan. The Shield engine sees the full, decoded JSON-RPC request:

{
  "method": "tools/call",
  "params": {
    "name": "execute_sql",
    "arguments": { "query": "DROP DATABASE prod;" }
  }
}

The matcher walks every string in params against each rule's pre-compiled regexes. SQL-aware extractors automatically pull from query/sql/statement keys so rules don't have to know each tool's individual parameter schema.

2. LLM response seam

Hooked into handle_provider_request immediately after MAESTRO response validation. The proxy decompresses the upstream body, extracts the assistant's content string(s) (OpenAI choices[].message.content and Anthropic content[].text shapes are both supported) and runs where: llm_response rules against the text.

Why both? The MCP seam catches what the agent does; the LLM seam catches what the agent plans to do. Many destructive plans show up in the assistant message — “Now I'll run git push --force…” — before the IDE has parsed them into a tool call. Catching the plan beats catching the action.

Severity tiers and outcomes

Severity	Outcome	Returned to caller	Latency on hit
Critical	Hard block	HTTP 403 with structured `shield_blocked` error	None — instant deny
High	Approval queue	HTTP 403 with `shield_approval_required` + `ticket_id`	Until human approves or denies
Medium	Allow with warn	Pass through; `x-shield-warn` header set; audit-logged	None
Low	Audit only	Pass through silently; audit-logged	None

Dogfooding mode (`SHIELD_ENFORCE=false`)

In Phase-1 deployments the engine demotes every Block / Approval to AllowWithWarn so customers can validate the rule corpus against real traffic without breaking workflows. The audit log still records the original would-have-been decision under details.decision with details.enforce=false, so the dashboard's Shield Activity panel shows what would have happened.

Recommended path: run with SHIELD_ENFORCE=false for one week, tune any false positives by editing shieldset.yaml, then flip to true on a single canary pod and watch for 24-48h before rolling out cluster-wide.

Rule schema — `shieldset.yaml`

The shieldset is loaded at startup from SHIELD_RULESET_PATH (default /etc/smartflow/shieldset.yaml) or falls back to a built-in default ruleset bundled into the binary. Each rule is compiled once at load time — the hot path is just a tight loop over pre-built regex::Regex objects.

shieldset:
  version: 1

  rules:
    # SQL — Critical
    - id: sql.drop_database
      severity: Critical
      where: tool_call
      match:
        tool: ["execute_sql", "postgres.query", "mysql.query"]
        sql_matches: ['(?i)\bDROP\s+DATABASE\b']
      reason: "DROP DATABASE is never auto-allowed."

    # Git — Critical
    - id: git.force_push_protected
      severity: Critical
      where: tool_call
      match:
        tool: ["run_terminal", "bash", "git"]
        any_param_matches:
          - '\bgit\s+push\s+.*(--force|-f)\s+(origin\s+)?(main|master|prod)\b'
      reason: "Force-push to a protected branch is forbidden."

    # LLM response — Medium
    - id: llm.suggests_force_push
      severity: Medium
      where: llm_response
      match:
        text_matches: ['(?i)git\s+push\s+.*--force\b.*\b(main|master|prod)\b']
      reason: "Assistant plan suggests force-push to a protected branch."

    # Anomaly — sliding-window destructive-verb burst
    - id: anomaly.destructive_burst
      severity: High
      where: tool_call
      anomaly:
        kind: destructive_verb_burst
        window_seconds: 300
        threshold: 5
      reason: "Destructive operation burst detected — pausing for review."

Built-in default ruleset — 15 starter rules

Smartflow Shield ships with 15 starter rules covering the most common destructive patterns. The defaults are embedded into the proxy binary via include_str! so the engine never starts ruleless even if shieldset.yaml is missing.

Looking for the bigger corpus? The standalone aperion-shield binary ships 45 rules across 12 categories — adding secrets, supply-chain, reverse-shells, sudo / privilege, cloud (AWS / GCP / Azure), Kubernetes and Docker — plus an adaptive scoring layer and an identity-gating subsystem. The same YAML schema works in both, so you can author once and run anywhere. See Aperion Shield → Built-in defaults.

Category	Rule ID	Severity	What it catches
SQL	`sql.drop_database`	Critical	`DROP DATABASE` in any SQL tool
	`sql.drop_table_or_schema`	High	`DROP TABLE` / `DROP SCHEMA` / `TRUNCATE TABLE`
	`sql.unscoped_delete`	High	`DELETE FROM <t>;` with no `WHERE` clause
	`sql.unscoped_update`	High	`UPDATE <t> SET …;` with no `WHERE` clause
	`sql.grant_or_revoke_all`	Medium	`GRANT ALL` / `REVOKE ALL` on schemas or roles
Git	`git.force_push_protected`	Critical	`git push --force` to `main`/`master`/`prod`
	`git.history_rewrite`	High	`git filter-repo`, `filter-branch`, `reset --hard HEAD~`
	`git.branch_force_delete`	Medium	`git branch -D`
Filesystem	`fs.recursive_delete_root`	Critical	`rm -rf /`, `rm -rf $HOME`, `rm -rf ~`
	`fs.dd_to_block_device`	Critical	`dd if=… of=/dev/sd*`
	`fs.delete_production_path`	High	Tool-driven delete of `/etc`, `/var`, `/usr`, `/opt`
LLM response	`llm.suggests_drop_database`	High	Assistant plans `DROP DATABASE` / `TRUNCATE TABLE`
	`llm.suggests_force_push`	Medium	Assistant plans force-push to a protected branch
	`llm.suggests_rm_rf`	Medium	Assistant plans `rm -rf /` or similar
Anomaly	`anomaly.destructive_burst`	High	5+ destructive tool calls per `(actor, server, tool)` in 5 minutes

Match keys

tool: […] — optional whitelist of tool names. Omit to match every tool.
any_param_matches: ['regex', …] — match any string in params, recursively.
sql_matches: ['regex', …] — match against extracted SQL strings only (recognises query/sql/statement).
text_matches: ['regex', …] — match the LLM response body. Only valid with where: llm_response.

Regex flavour: Shield uses Rust's regex crate, which does not support lookbehind ((?<!…)) or arbitrary lookahead ((?=…)). Rules using those constructs are rejected at load time with a warning. If you need either, express the constraint with anchors and alternation, or split the rule into multiple narrower rules.

Example 1 — `DROP DATABASE` via SQL tool (MCP seam)

The agent decides to clean up a stale environment and issues:

{
  "method": "tools/call",
  "params": {
    "name": "execute_sql",
    "arguments": { "query": "DROP DATABASE prod;" }
  }
}

Rule sql.drop_database matches with severity Critical. With enforce on, the proxy returns:

HTTP/1.1 403 Forbidden
x-shield-blocked: block
x-shield-rule: sql.drop_database
x-shield-severity: Critical

{
  "error": {
    "type": "shield_blocked",
    "rule_id": "sql.drop_database",
    "severity": "Critical",
    "reason": "DROP DATABASE is never auto-allowed."
  }
}

Example 2 — `git push --force main` (MCP seam)

{
  "method": "tools/call",
  "params": {
    "name": "run_terminal",
    "arguments": { "command": "git push origin main --force" }
  }
}

Rule git.force_push_protected matches with severity Critical. The pattern is anchored to the protected branch list (main/master/prod), so a force-push to feature/widgets is fine.

Example 3 — `rm -rf /` (MCP seam)

{
  "method": "tools/call",
  "params": {
    "name": "bash",
    "arguments": { "command": "rm -rf $HOME" }
  }
}

Rule fs.recursive_delete_root matches with severity Critical. Variants caught by the same rule include: rm -rf /, rm -rf ~, rm -rf $HOME, rm -rf $PWD.

A more targeted variant — deleting /etc/nginx/nginx.conf via a dedicated filesystem.delete_file tool — is caught by fs.delete_production_path (severity High) and queued for approval rather than hard-blocked.

Example 4 — assistant plans destruction (LLM seam)

Even when no tool is called, the assistant's text gets scanned. A response like:

// Assistant says, embedded in chat completion:
“Let me clean this up by running:

  DROP DATABASE customer_archive;
  TRUNCATE TABLE orders;

Then we'll re-import from the backup.”

matches rule llm.suggests_drop_database (severity High). In dogfooding mode the response passes through with these added headers:

x-shield-rule: llm.suggests_drop_database
x-shield-severity: High
x-shield-warn: [shield-shadow] would have approval: Assistant plan contains
               destructive SQL — confirm before executing.

With enforce on, the proxy returns 403 shield_approval_required and the IDE polls until a human approves or denies.

Example 5 — destructive-verb burst (anomaly seam)

Each destructive tool call (anything matching the SQL/git/fs destructive verb set) increments a Redis sorted-set counter keyed by (actor, mcp_server, tool). When the count crosses the rule's threshold, the rule fires — even if no individual call would have on its own.

Default rule: 5 destructive ops in 300 seconds → anomaly.destructive_burst (severity High). Storage uses a Redis ZSET with timestamp scores so the window slides cleanly without cron sweeps. When Redis is unreachable, an in-process fallback keeps single-pod deployments working.

Approval workflow — end to end Paid · Enterprise

Tier: the Redis-backed, dashboard-driven approval queue described below is part of the paid Smartflow product. The free standalone aperion-shield implements the same severity model with a local file-based approval inbox (./.aperion-shield/inbox) — same Critical / High / Medium / Low tiers, no cross-machine queue, no Approve / Deny buttons. See Aperion Shield → Local approval inbox.

Agent issues a request that matches a High-severity rule.
Shield creates a Redis ticket: smartflow:shield:approvals:{ticket_id} (24h TTL) and pushes the id onto the pending zset.
Proxy returns HTTP 403 with shield_approval_required body and the x-shield-ticket header.
The dashboard's Shield Approvals panel renders the pending ticket with Approve / Deny buttons.
The agent's retry loop polls GET /api/shield/approvals/{ticket_id}/status on a tight interval (this endpoint is unauthenticated and ultra-cheap — single Redis GET).
When a human clicks Approve, the ticket transitions to Approved; the next poll returns "approved".
The agent re-issues the original request, which now passes Shield (the rule re-evaluates with the ticket consumed).

Approval queue REST API Paid · Enterprise

Method	Path	Auth	Purpose
GET	`/api/shield/approvals?status=pending&limit=N`	Admin	List pending tickets
GET	`/api/shield/approvals/{ticket_id}`	Admin	Fetch a single ticket
GET	`/api/shield/approvals/{ticket_id}/status`	None	Cheap polling for the IDE retry loop
POST	`/api/shield/approvals/{ticket_id}/approve`	Admin	Approve
POST	`/api/shield/approvals/{ticket_id}/deny`	Admin	Deny

Admin endpoints require Authorization: Bearer $SF_SHIELD_ADMIN_KEY. Set the SF_SHIELD_ADMIN_KEY env var on api-server in production. Leave it unset in dev for open access.

IDE-side error contract

The agent's retry path should treat the headers as the source of truth — no body parsing required:

Header	Meaning
`x-shield-blocked: block`	Hard block. Surface the error to the user; do not retry.
`x-shield-blocked: approval_required`	Queued. Read `x-shield-ticket`; poll `/api/shield/approvals/{ticket_id}/status` until `"approved"` or `"denied"`; on approval, re-issue.
`x-shield-warn: <banner>`	Non-blocking warning. Show the banner; the response is still complete.
`x-shield-rule: <rule_id>`	Always present when Shield matched. Useful for telemetry and per-rule troubleshooting.
`x-shield-severity: Critical\|High\|Medium\|Low`	Tier the rule fired at.

Sample retry loop (Python)

import time, requests

def call_with_shield_retry(url, json, max_wait_seconds=600):
    r = requests.post(url, json=json)
    if r.status_code != 403 or r.headers.get("x-shield-blocked") != "approval_required":
        return r

    ticket = r.headers["x-shield-ticket"]
    deadline = time.time() + max_wait_seconds
    while time.time() < deadline:
        s = requests.get(f"https://your-proxy/api/shield/approvals/{ticket}/status").json()
        if s["status"] == "approved":
            return requests.post(url, json=json)   # re-issue
        if s["status"] == "denied":
            raise RuntimeError(f"Shield denied {ticket}")
        time.sleep(2)
    raise TimeoutError(f"Shield ticket {ticket} not resolved in {max_wait_seconds}s")

Configuration reference (env vars)

Variable	Default	Purpose
`SHIELD_RULESET_PATH`	`/etc/smartflow/shieldset.yaml`	Path to the YAML rule file
`SHIELD_ENFORCE`	`false`	When false, demote Block / Approval to logged-only AllowWithWarn (Phase-1 dogfooding)
`SF_SHIELD_ADMIN_KEY`	unset	Bearer token for the approval-queue admin endpoints
`REDIS_URL`	unset	Required for cross-pod approval queue and anomaly counters; in-process fallback when missing

Audit log integration

Every actionable decision (Warn / Approval / Block) is appended to the existing tamper-evident audit chain with event_type: Custom("shield"). The entry's details map carries:

decision — block / approval / warn
rule_id — the rule that fired
severity — Critical / High / Medium / Low
surface — mcp_tool_call or llm_response
surface_target — tool name (MCP) or provider name (LLM)
enforce — "true" / "false" at decision time
reason — the rule's reason string
banner (Warn only) or ticket_id (Approval only)

Query directly:

GET /api/audit/logs?event_type=shield&limit=100

Spend report dimensions Paid · Enterprise

Tier: Spend tracking is part of the paid Smartflow proxy and requires the Smartflow billing/usage subsystem. The free standalone aperion-shield does not proxy LLM calls and therefore has nothing to bill — there is no equivalent on the OSS side.

For finance/compliance “what did Shield catch this month and at what cost” reports, three dimensions are exposed on /spend/report:

group_by=shield_rule — bucket spend by the rule that fired (or no_rule)
group_by=shield_severity — Critical / High / Medium / Low
group_by=shield_decision — block / approval / warn / allow

Dashboard panels Paid · Enterprise

Tier: the web dashboard is part of the paid Smartflow product. The free standalone aperion-shield has no web UI — every decision is logged to stderr (Cursor / Claude Code surface that in their tool-call panel) and High-severity approvals are handled via the local file inbox at ./.aperion-shield/inbox. To get the dashboard panels below across a fleet of standalone shields, enroll them in Shield Org Mode.

The admin dashboard exposes Shield at /dashboard/shield.html:

Stats banner — counts of Block, Approval, Warn, Audit-only and Shadow events in the last query window.
Shield Activity — real-time feed of every actionable decision with rule, severity, surface, actor, and reason. Filterable by severity / decision / rule substring; auto-refreshes every 30 seconds.
Pending Approvals — High-severity tickets with Approve / Deny buttons. Resolution propagates to the IDE polling loop within seconds.

Org Mode adds three more pages — shield_fleet.html, shield_policy.html, shield_settings.html — for managing enrolled standalone shields. See Shield Org Mode → Dashboard pages.

Rollout playbook

Deploy v1.7.37+ with SHIELD_ENFORCE=false. The default ruleset loads automatically.
Watch the Shield Activity panel for ≥24 hours. Every match is logged with the would-have-been decision.
Tune any false positives by editing shieldset.yaml: drop the severity, restrict the tool: list, or remove the rule. Pod restart picks up changes; ShieldEngine::replace_ruleset supports in-process hot-reload for advanced setups.
Flip SHIELD_ENFORCE=true on a single canary pod. Watch the same panel for 24-48h.
Roll out cluster-wide. The approval queue + IDE polling now provides the human-in-the-loop step for High-severity ops.

Fail-open semantics

Shield is fail-open by design. Any internal error inside Shield (regex panic, anomaly tracker failure, missing ruleset, Redis outage) returns Allow and logs a warn!. The proxy will never 5xx because Shield broke; the trade-off is that Shield can't be a load-bearing authorisation layer — keep MCP access control in front of it.

Concretely, the engine will fall back rather than fail in these scenarios:

YAML parse failure → load the embedded built-in default ruleset.
Built-in fallback parse failure → log CRITICAL, run with rules=0 (every decision is Allow).
Redis unreachable for the anomaly counter → switch to in-process counter for that pod.
Redis unreachable for the approval queue → log a warning, return Allow rather than block on a queue we can't write to.
Audit-log write failure → log warn!, return the original decision.

Shield Org Mode — fleet-managed standalone shields

Shield Org Mode extends the Smartflow control plane to manage every aperion-shield running on a developer laptop, CI runner, or production box across your organisation. The proxy-resident shield documented above and Org Mode are complementary: the proxy enforces at the gateway, Org Mode enforces at the developer's machine before the call ever leaves it.

An enrolled standalone shield pulls its policy from Smartflow, ships every actionable decision back as an audit event, delegates identity verification through the central dashboard, and respects a fleet-wide killswitch. Standalone mode (no enrollment) still works exactly the same; Org Mode is purely additive.

Control-plane API

All endpoints live under /api/enterprise/shield/*. Device-scoped routes auth with the HMAC vkey issued at enrollment; admin-scoped routes auth with Authorization: Bearer $SF_SHIELD_ADMIN_KEY.

Method	Path	Auth	Purpose
GET	`/api/enterprise/shield/info`	vkey	Provider catalog & readiness, killswitch state
GET	`/api/enterprise/shield/shieldset/{group}`	vkey	Pull the active shieldset YAML + hash + version
GET	`/api/enterprise/shield/shieldset/{group}/version`	vkey	Cheap version probe (poll target, no body)
PUT	`/api/enterprise/shield/shieldset/{group}`	admin	Publish a new shieldset version
POST	`/api/enterprise/shield/events`	vkey	Batched audit-event ingest
GET	`/api/enterprise/shield/events`	admin	Read the audit timeline (with filters)
POST	`/api/enterprise/shield/identity/check`	vkey	Cache check for a valid proof on `(subject, scope)`
POST	`/api/enterprise/shield/identity/begin`	vkey	Start an identity flow; returns a `verify_url` + challenge id
GET	`/api/enterprise/shield/identity/result/{id}`	vkey	Poll for the signed proof when the user finishes
GET / POST	`/api/enterprise/shield/killswitch`	admin (POST)	Get / flip the fleet killswitch

Enrollment, heartbeats, vkey issuance and fleet listing reuse the existing device API at /api/enterprise/devices/* — same primitives as the SDK device fleet.

Dashboard pages

Three new pages render the Org Mode control plane. All ship with Smartflow v1.7.38+ and use the existing dark theme.

Page	Path	What it does
Shield Fleet	`/dashboard/shield_fleet.html`	List enrolled devices (active / stale / revoked), issue one-time enrollment tokens, flip the fleet killswitch.
Shield Policy	`/dashboard/shield_policy.html`	YAML editor for the active `shieldset.yaml` per policy group. Version + content hash displayed; saves trigger a version bump that enrolled devices pick up on the next 30 s poll.
Shield Settings	`/dashboard/shield_settings.html`	Identity-provider catalog + readiness, env-var setup hints, killswitch toggle, live filterable timeline of every Org Mode audit event.

Lifecycle on the developer's machine

# 1) Admin issues a one-time enrollment token from /dashboard/shield_fleet.html

# 2) Developer enrolls on their laptop:
aperion-shield --enroll \
  --smartflow-url https://smartflow.example.com \
  --token enroll_o7s9...3jk \
  --device-name "alice-laptop"

# 3) Subsequent runs pull policy, send audit, delegate identity verification:
aperion-shield -- npx -y @modelcontextprotocol/server-postgres "$PG_URL"

# 4) Disenroll when the laptop is decommissioned:
aperion-shield --disenroll --revoke   # also revokes the vkey server-side

Same YAML schema, two enforcement points. Author a shieldset once. The proxy enforces it at /v1/messages / /anthropic/v1/messages / /cursor/* / /api/mcp/*; enrolled standalones enforce it at tools/call on every dev machine. Audit lands in the same chain.