Shield — AI Coding Agent Guardrails
Tiered destructive-operation guardrails for IDE-resident AI coding agents (Cursor, VS Code, Codex, Claude Code). Two intercept seams + a 4-tier severity decision model + a Redis-backed approval queue, all wired into the existing tamper-evident audit chain.
What Shield is
Shield is the third leg of Smartflow's request-time enforcement stack. Where access control answers “is this caller allowed to invoke this tool at all?” and compliance answers “does this content violate org policy (PII, jailbreak, prompt injection)?”, Shield answers a different question:
It is the second pair of eyes on every tools/call the agent makes and every assistant message the model produces. When a Cursor / Claude Code / Codex agent decides to run DROP DATABASE prod;, git push --force main, or rm -rf /, Shield catches it before the upstream tool ever executes — or before the IDE parses the assistant's plan into another tool call.
Why it exists
An AI coding agent is, in effect, an autonomous user with broad tool access — your database, your terminal, your git remotes, your filesystem, your cloud APIs. Pre-existing controls catch a lot but not this:
| Existing layer | Catches | Misses |
|---|---|---|
| Access control (allowlists) | Tools the key isn't allowed to call | Allowed-but-destructive calls |
| Compliance (PII, jailbreak) | Sensitive data, prompt-injection patterns | Perfectly compliant DROP DATABASE SQL |
| MAESTRO policy | Org-level allowlists, model selection | Operation semantics within an allowed call |
Shield closes that gap. The threat model is not a malicious agent — it's a competent agent operating on slightly stale or partially incorrect context, executing a perfectly valid command that happens to wipe customer data.
The two intercept seams
Shield runs at two points in the request lifecycle. Each fires independently — a request can be cleared by one and caught by the other.
1. MCP tools/call seam
Hooked into MCPProxyHandler::handle_request, immediately after tool-level access control passes and before the compliance scan. The Shield engine sees the full, decoded JSON-RPC request:
{
"method": "tools/call",
"params": {
"name": "execute_sql",
"arguments": { "query": "DROP DATABASE prod;" }
}
}
The matcher walks every string in params against each rule's pre-compiled regexes. SQL-aware extractors automatically pull from query/sql/statement keys so rules don't have to know each tool's individual parameter schema.
2. LLM response seam
Hooked into handle_provider_request immediately after MAESTRO response validation. The proxy decompresses the upstream body, extracts the assistant's content string(s) (OpenAI choices[].message.content and Anthropic content[].text shapes are both supported) and runs where: llm_response rules against the text.
git push --force…” — before the IDE has parsed them into a tool call. Catching the plan beats catching the action.
Severity tiers and outcomes
| Severity | Outcome | Returned to caller | Latency on hit |
|---|---|---|---|
| Critical | Hard block | HTTP 403 with structured shield_blocked error | None — instant deny |
| High | Approval queue | HTTP 403 with shield_approval_required + ticket_id | Until human approves or denies |
| Medium | Allow with warn | Pass through; x-shield-warn header set; audit-logged | None |
| Low | Audit only | Pass through silently; audit-logged | None |
Dogfooding mode (SHIELD_ENFORCE=false)
details.decision with details.enforce=false, so the dashboard's Shield Activity panel shows what would have happened.
Recommended path: run with SHIELD_ENFORCE=false for one week, tune any false positives by editing shieldset.yaml, then flip to true on a single canary pod and watch for 24-48h before rolling out cluster-wide.
Rule schema — shieldset.yaml
The shieldset is loaded at startup from SHIELD_RULESET_PATH (default /etc/smartflow/shieldset.yaml) or falls back to a built-in default ruleset bundled into the binary. Each rule is compiled once at load time — the hot path is just a tight loop over pre-built regex::Regex objects.
shieldset:
version: 1
rules:
# SQL — Critical
- id: sql.drop_database
severity: Critical
where: tool_call
match:
tool: ["execute_sql", "postgres.query", "mysql.query"]
sql_matches: ['(?i)\bDROP\s+DATABASE\b']
reason: "DROP DATABASE is never auto-allowed."
# Git — Critical
- id: git.force_push_protected
severity: Critical
where: tool_call
match:
tool: ["run_terminal", "bash", "git"]
any_param_matches:
- '\bgit\s+push\s+.*(--force|-f)\s+(origin\s+)?(main|master|prod)\b'
reason: "Force-push to a protected branch is forbidden."
# LLM response — Medium
- id: llm.suggests_force_push
severity: Medium
where: llm_response
match:
text_matches: ['(?i)git\s+push\s+.*--force\b.*\b(main|master|prod)\b']
reason: "Assistant plan suggests force-push to a protected branch."
# Anomaly — sliding-window destructive-verb burst
- id: anomaly.destructive_burst
severity: High
where: tool_call
anomaly:
kind: destructive_verb_burst
window_seconds: 300
threshold: 5
reason: "Destructive operation burst detected — pausing for review."
Built-in default ruleset — 15 starter rules
Smartflow Shield ships with 15 starter rules covering the most common destructive patterns. The defaults are embedded into the proxy binary via include_str! so the engine never starts ruleless even if shieldset.yaml is missing.
aperion-shield binary ships 45 rules across 12 categories — adding secrets, supply-chain, reverse-shells, sudo / privilege, cloud (AWS / GCP / Azure), Kubernetes and Docker — plus an adaptive scoring layer and an identity-gating subsystem. The same YAML schema works in both, so you can author once and run anywhere. See Aperion Shield → Built-in defaults.
| Category | Rule ID | Severity | What it catches |
|---|---|---|---|
| SQL | sql.drop_database | Critical | DROP DATABASE in any SQL tool |
sql.drop_table_or_schema | High | DROP TABLE / DROP SCHEMA / TRUNCATE TABLE | |
sql.unscoped_delete | High | DELETE FROM <t>; with no WHERE clause | |
sql.unscoped_update | High | UPDATE <t> SET …; with no WHERE clause | |
sql.grant_or_revoke_all | Medium | GRANT ALL / REVOKE ALL on schemas or roles | |
| Git | git.force_push_protected | Critical | git push --force to main/master/prod |
git.history_rewrite | High | git filter-repo, filter-branch, reset --hard HEAD~ | |
git.branch_force_delete | Medium | git branch -D | |
| Filesystem | fs.recursive_delete_root | Critical | rm -rf /, rm -rf $HOME, rm -rf ~ |
fs.dd_to_block_device | Critical | dd if=… of=/dev/sd* | |
fs.delete_production_path | High | Tool-driven delete of /etc, /var, /usr, /opt | |
| LLM response | llm.suggests_drop_database | High | Assistant plans DROP DATABASE / TRUNCATE TABLE |
llm.suggests_force_push | Medium | Assistant plans force-push to a protected branch | |
llm.suggests_rm_rf | Medium | Assistant plans rm -rf / or similar | |
| Anomaly | anomaly.destructive_burst | High | 5+ destructive tool calls per (actor, server, tool) in 5 minutes |
Match keys
tool: […]— optional whitelist of tool names. Omit to match every tool.any_param_matches: ['regex', …]— match any string inparams, recursively.sql_matches: ['regex', …]— match against extracted SQL strings only (recognisesquery/sql/statement).text_matches: ['regex', …]— match the LLM response body. Only valid withwhere: llm_response.
regex crate, which does not support lookbehind ((?<!…)) or arbitrary lookahead ((?=…)). Rules using those constructs are rejected at load time with a warning. If you need either, express the constraint with anchors and alternation, or split the rule into multiple narrower rules.
Example 1 — DROP DATABASE via SQL tool (MCP seam)
The agent decides to clean up a stale environment and issues:
{
"method": "tools/call",
"params": {
"name": "execute_sql",
"arguments": { "query": "DROP DATABASE prod;" }
}
}
Rule sql.drop_database matches with severity Critical. With enforce on, the proxy returns:
HTTP/1.1 403 Forbidden
x-shield-blocked: block
x-shield-rule: sql.drop_database
x-shield-severity: Critical
{
"error": {
"type": "shield_blocked",
"rule_id": "sql.drop_database",
"severity": "Critical",
"reason": "DROP DATABASE is never auto-allowed."
}
}
Example 2 — git push --force main (MCP seam)
{
"method": "tools/call",
"params": {
"name": "run_terminal",
"arguments": { "command": "git push origin main --force" }
}
}
Rule git.force_push_protected matches with severity Critical. The pattern is anchored to the protected branch list (main/master/prod), so a force-push to feature/widgets is fine.
Example 3 — rm -rf / (MCP seam)
{
"method": "tools/call",
"params": {
"name": "bash",
"arguments": { "command": "rm -rf $HOME" }
}
}
Rule fs.recursive_delete_root matches with severity Critical. Variants caught by the same rule include: rm -rf /, rm -rf ~, rm -rf $HOME, rm -rf $PWD.
A more targeted variant — deleting /etc/nginx/nginx.conf via a dedicated filesystem.delete_file tool — is caught by fs.delete_production_path (severity High) and queued for approval rather than hard-blocked.
Example 4 — assistant plans destruction (LLM seam)
Even when no tool is called, the assistant's text gets scanned. A response like:
// Assistant says, embedded in chat completion:
“Let me clean this up by running:
DROP DATABASE customer_archive;
TRUNCATE TABLE orders;
Then we'll re-import from the backup.”
matches rule llm.suggests_drop_database (severity High). In dogfooding mode the response passes through with these added headers:
x-shield-rule: llm.suggests_drop_database
x-shield-severity: High
x-shield-warn: [shield-shadow] would have approval: Assistant plan contains
destructive SQL — confirm before executing.
With enforce on, the proxy returns 403 shield_approval_required and the IDE polls until a human approves or denies.
Example 5 — destructive-verb burst (anomaly seam)
Each destructive tool call (anything matching the SQL/git/fs destructive verb set) increments a Redis sorted-set counter keyed by (actor, mcp_server, tool). When the count crosses the rule's threshold, the rule fires — even if no individual call would have on its own.
Default rule: 5 destructive ops in 300 seconds → anomaly.destructive_burst (severity High). Storage uses a Redis ZSET with timestamp scores so the window slides cleanly without cron sweeps. When Redis is unreachable, an in-process fallback keeps single-pod deployments working.
Approval workflow — end to end Paid · Enterprise
aperion-shield implements the same severity model with a local file-based approval inbox (./.aperion-shield/inbox) — same Critical / High / Medium / Low tiers, no cross-machine queue, no Approve / Deny buttons. See Aperion Shield → Local approval inbox.
- Agent issues a request that matches a High-severity rule.
- Shield creates a Redis ticket:
smartflow:shield:approvals:{ticket_id}(24h TTL) and pushes the id onto the pending zset. - Proxy returns HTTP 403 with
shield_approval_requiredbody and thex-shield-ticketheader. - The dashboard's Shield Approvals panel renders the pending ticket with Approve / Deny buttons.
- The agent's retry loop polls
GET /api/shield/approvals/{ticket_id}/statuson a tight interval (this endpoint is unauthenticated and ultra-cheap — single Redis GET). - When a human clicks Approve, the ticket transitions to
Approved; the next poll returns"approved". - The agent re-issues the original request, which now passes Shield (the rule re-evaluates with the ticket consumed).
Approval queue REST API Paid · Enterprise
| Method | Path | Auth | Purpose |
|---|---|---|---|
| GET | /api/shield/approvals?status=pending&limit=N | Admin | List pending tickets |
| GET | /api/shield/approvals/{ticket_id} | Admin | Fetch a single ticket |
| GET | /api/shield/approvals/{ticket_id}/status | None | Cheap polling for the IDE retry loop |
| POST | /api/shield/approvals/{ticket_id}/approve | Admin | Approve |
| POST | /api/shield/approvals/{ticket_id}/deny | Admin | Deny |
Admin endpoints require Authorization: Bearer $SF_SHIELD_ADMIN_KEY. Set the SF_SHIELD_ADMIN_KEY env var on api-server in production. Leave it unset in dev for open access.
IDE-side error contract
The agent's retry path should treat the headers as the source of truth — no body parsing required:
| Header | Meaning |
|---|---|
x-shield-blocked: block | Hard block. Surface the error to the user; do not retry. |
x-shield-blocked: approval_required | Queued. Read x-shield-ticket; poll /api/shield/approvals/{ticket_id}/status until "approved" or "denied"; on approval, re-issue. |
x-shield-warn: <banner> | Non-blocking warning. Show the banner; the response is still complete. |
x-shield-rule: <rule_id> | Always present when Shield matched. Useful for telemetry and per-rule troubleshooting. |
x-shield-severity: Critical|High|Medium|Low | Tier the rule fired at. |
Sample retry loop (Python)
import time, requests
def call_with_shield_retry(url, json, max_wait_seconds=600):
r = requests.post(url, json=json)
if r.status_code != 403 or r.headers.get("x-shield-blocked") != "approval_required":
return r
ticket = r.headers["x-shield-ticket"]
deadline = time.time() + max_wait_seconds
while time.time() < deadline:
s = requests.get(f"https://your-proxy/api/shield/approvals/{ticket}/status").json()
if s["status"] == "approved":
return requests.post(url, json=json) # re-issue
if s["status"] == "denied":
raise RuntimeError(f"Shield denied {ticket}")
time.sleep(2)
raise TimeoutError(f"Shield ticket {ticket} not resolved in {max_wait_seconds}s")
Configuration reference (env vars)
| Variable | Default | Purpose |
|---|---|---|
SHIELD_RULESET_PATH | /etc/smartflow/shieldset.yaml | Path to the YAML rule file |
SHIELD_ENFORCE | false | When false, demote Block / Approval to logged-only AllowWithWarn (Phase-1 dogfooding) |
SF_SHIELD_ADMIN_KEY | unset | Bearer token for the approval-queue admin endpoints |
REDIS_URL | unset | Required for cross-pod approval queue and anomaly counters; in-process fallback when missing |
Audit log integration
Every actionable decision (Warn / Approval / Block) is appended to the existing tamper-evident audit chain with event_type: Custom("shield"). The entry's details map carries:
decision—block/approval/warnrule_id— the rule that firedseverity— Critical / High / Medium / Lowsurface—mcp_tool_callorllm_responsesurface_target— tool name (MCP) or provider name (LLM)enforce—"true"/"false"at decision timereason— the rule's reason stringbanner(Warn only) orticket_id(Approval only)
Query directly:
GET /api/audit/logs?event_type=shield&limit=100
Spend report dimensions Paid · Enterprise
aperion-shield does not proxy LLM calls and therefore has nothing to bill — there is no equivalent on the OSS side.
For finance/compliance “what did Shield catch this month and at what cost” reports, three dimensions are exposed on /spend/report:
group_by=shield_rule— bucket spend by the rule that fired (orno_rule)group_by=shield_severity— Critical / High / Medium / Lowgroup_by=shield_decision— block / approval / warn / allow
Dashboard panels Paid · Enterprise
aperion-shield has no web UI — every decision is logged to stderr (Cursor / Claude Code surface that in their tool-call panel) and High-severity approvals are handled via the local file inbox at ./.aperion-shield/inbox. To get the dashboard panels below across a fleet of standalone shields, enroll them in Shield Org Mode.
The admin dashboard exposes Shield at /dashboard/shield.html:
- Stats banner — counts of Block, Approval, Warn, Audit-only and Shadow events in the last query window.
- Shield Activity — real-time feed of every actionable decision with rule, severity, surface, actor, and reason. Filterable by severity / decision / rule substring; auto-refreshes every 30 seconds.
- Pending Approvals — High-severity tickets with Approve / Deny buttons. Resolution propagates to the IDE polling loop within seconds.
Org Mode adds three more pages — shield_fleet.html, shield_policy.html, shield_settings.html — for managing enrolled standalone shields. See Shield Org Mode → Dashboard pages.
Rollout playbook
- Deploy v1.7.37+ with
SHIELD_ENFORCE=false. The default ruleset loads automatically. - Watch the Shield Activity panel for ≥24 hours. Every match is logged with the would-have-been decision.
- Tune any false positives by editing
shieldset.yaml: drop the severity, restrict thetool:list, or remove the rule. Pod restart picks up changes;ShieldEngine::replace_rulesetsupports in-process hot-reload for advanced setups. - Flip
SHIELD_ENFORCE=trueon a single canary pod. Watch the same panel for 24-48h. - Roll out cluster-wide. The approval queue + IDE polling now provides the human-in-the-loop step for High-severity ops.
Fail-open semantics
Allow and logs a warn!. The proxy will never 5xx because Shield broke; the trade-off is that Shield can't be a load-bearing authorisation layer — keep MCP access control in front of it.
Concretely, the engine will fall back rather than fail in these scenarios:
- YAML parse failure → load the embedded built-in default ruleset.
- Built-in fallback parse failure → log
CRITICAL, run withrules=0(every decision is Allow). - Redis unreachable for the anomaly counter → switch to in-process counter for that pod.
- Redis unreachable for the approval queue → log a warning, return Allow rather than block on a queue we can't write to.
- Audit-log write failure → log
warn!, return the original decision.
Shield Org Mode — fleet-managed standalone shields
Shield Org Mode extends the Smartflow control plane to manage every aperion-shield running on a developer laptop, CI runner, or production box across your organisation. The proxy-resident shield documented above and Org Mode are complementary: the proxy enforces at the gateway, Org Mode enforces at the developer's machine before the call ever leaves it.
An enrolled standalone shield pulls its policy from Smartflow, ships every actionable decision back as an audit event, delegates identity verification through the central dashboard, and respects a fleet-wide killswitch. Standalone mode (no enrollment) still works exactly the same; Org Mode is purely additive.
Control-plane API
All endpoints live under /api/enterprise/shield/*. Device-scoped routes auth with the HMAC vkey issued at enrollment; admin-scoped routes auth with Authorization: Bearer $SF_SHIELD_ADMIN_KEY.
| Method | Path | Auth | Purpose |
|---|---|---|---|
| GET | /api/enterprise/shield/info | vkey | Provider catalog & readiness, killswitch state |
| GET | /api/enterprise/shield/shieldset/{group} | vkey | Pull the active shieldset YAML + hash + version |
| GET | /api/enterprise/shield/shieldset/{group}/version | vkey | Cheap version probe (poll target, no body) |
| PUT | /api/enterprise/shield/shieldset/{group} | admin | Publish a new shieldset version |
| POST | /api/enterprise/shield/events | vkey | Batched audit-event ingest |
| GET | /api/enterprise/shield/events | admin | Read the audit timeline (with filters) |
| POST | /api/enterprise/shield/identity/check | vkey | Cache check for a valid proof on (subject, scope) |
| POST | /api/enterprise/shield/identity/begin | vkey | Start an identity flow; returns a verify_url + challenge id |
| GET | /api/enterprise/shield/identity/result/{id} | vkey | Poll for the signed proof when the user finishes |
| GET / POST | /api/enterprise/shield/killswitch | admin (POST) | Get / flip the fleet killswitch |
Enrollment, heartbeats, vkey issuance and fleet listing reuse the existing device API at /api/enterprise/devices/* — same primitives as the SDK device fleet.
Dashboard pages
Three new pages render the Org Mode control plane. All ship with Smartflow v1.7.38+ and use the existing dark theme.
| Page | Path | What it does |
|---|---|---|
| Shield Fleet | /dashboard/shield_fleet.html | List enrolled devices (active / stale / revoked), issue one-time enrollment tokens, flip the fleet killswitch. |
| Shield Policy | /dashboard/shield_policy.html | YAML editor for the active shieldset.yaml per policy group. Version + content hash displayed; saves trigger a version bump that enrolled devices pick up on the next 30 s poll. |
| Shield Settings | /dashboard/shield_settings.html | Identity-provider catalog + readiness, env-var setup hints, killswitch toggle, live filterable timeline of every Org Mode audit event. |
Lifecycle on the developer's machine
# 1) Admin issues a one-time enrollment token from /dashboard/shield_fleet.html
# 2) Developer enrolls on their laptop:
aperion-shield --enroll \
--smartflow-url https://smartflow.example.com \
--token enroll_o7s9...3jk \
--device-name "alice-laptop"
# 3) Subsequent runs pull policy, send audit, delegate identity verification:
aperion-shield -- npx -y @modelcontextprotocol/server-postgres "$PG_URL"
# 4) Disenroll when the laptop is decommissioned:
aperion-shield --disenroll --revoke # also revokes the vkey server-side
/v1/messages / /anthropic/v1/messages / /cursor/* / /api/mcp/*; enrolled standalones enforce it at tools/call on every dev machine. Audit lands in the same chain.