Skip to main content
CC Safety Net sits between an untrusted command source (an AI coding agent) and a trusted execution environment (the host shell). This page documents the trust boundaries, how fail-closed behavior is enforced, how secrets are protected, and the attack surface. To report a vulnerability, see the security policy instead.

Trust boundaries

Primary boundary: AI agent to shell

The core trust boundary sits between the AI coding agent and the host shell. CC Safety Net is the gatekeeper.
  • Untrusted side — command strings generated by AI agents. These are treated as potentially hostile because agents can be manipulated via prompt injection, confused context, or adversarial instructions into producing destructive commands.
  • Trusted side — the host shell where commands would execute.
Every command that reaches the shell tool on a supported platform flows through the analysis engine before it is allowed to run. If analysis returns a block reason, the command is denied.

Secondary boundaries

Four secondary boundaries cross into CC Safety Net from an external source. Each one is validated before it can influence analysis.
BoundarySourceHow it is validated
User configurationCustom rules in JSON files on diskParsed and schema-validated; malformed rules fail closed
Rulebook sourcesRulebooks fetched from GitHub or local directoriesRemote rulebooks integrity-checked via SHA-256 digests in the lockfile
Hook input JSONEach agent’s JSON payload on stdinParsed defensively; malformed JSON triggers a deny
Environment variablesMode flags and path overrides (CC_SAFETY_NET_*, TMPDIR, etc.)Read explicitly; security-critical values treated as untrusted

Fail-closed enforcement

Fail-closed is the foundational safety property: when analysis fails, config is invalid, or input cannot be parsed, the command is blocked rather than allowed. It is enforced at every entry point.
1

Hook entry points

The hook adapter wraps the analysis call in a try/catch. If analysis throws, the hook emits a deny decision with a “failed closed” reason instead of letting the command proceed. This applies to every stdin-based hook agent (Claude Code, Gemini CLI, Copilot CLI, Kimi Code).
2

Plugin and extension entry points

The OpenCode plugin and Pi extension apply the same pattern — analysis errors are caught and re-surfaced as block messages so the platform treats them as denied commands.
3

Strict mode

Strict mode extends fail-closed to commands the shell parser cannot safely tokenize, so unparseable input is blocked rather than passed through.
4

Config validation

When rulebook loading or validation produces errors, a fail-closed reason is attached to the config. Downstream analysis consults that field, so a broken rulebook state results in blocking rather than silent rule skipping.
See Design Principles for the rationale.

Secret redaction

Before any command or segment text is written to the audit log or returned to the agent, it passes through automatic secret redaction. The redactor scrubs PEM private keys, database URL environment variables, generic secret-bearing env assignments, common secret HTTP headers, URL credentials, and known provider token prefixes (GitHub, Slack, npm, Stripe, PyPI), plus JWTs and AWS access key IDs. Each matched value is replaced with <redacted>. Redaction is conservative and pattern-based — it reduces the risk of leaking secrets that happen to appear in a command’s arguments, but it is not exhaustive. New secret formats emerge regularly, so avoid piping real credentials through commands an agent runs. See the Audit Log reference for the full redaction scope.

Attack surface

The threat model enumerates the main attack surfaces and their mitigations.
Attack surfaceWhat an attacker triesMitigation
Shell command parserCraft a command string that exploits a parser edge case (unusual quoting, nested substitution, operator ambiguity) to hide a destructive payloadUnclosed-quote guard returns the raw string as one segment; variable references are preserved (not expanded) so dynamic substitutions can be detected; strict mode blocks unparseable commands; parser errors trigger fail-closed
Wrapper and interpreter strippingHide a destructive command behind sudo, env, bash -c, or an interpreter one-linerWrappers are stripped iteratively (with an iteration cap); shell wrappers and interpreter code are recursively re-analyzed up to 10 levels
Path traversal in rm analysisSlip a dangerous rm -rf target past classification using symlinks or path tricksTargets are resolved to canonical paths; $TMPDIR overrides pointing outside known temp dirs are detected; a residual TOCTOU window remains (see Known Limitations)
Rulebook supply chainServe a malicious rulebook from a GitHub sourceRemote rulebooks are SHA-256-verified against the lockfile and schema-validated; a malicious rulebook can add rules but cannot remove built-in blocking
Secret leakage in audit logsGet a secret written to the on-disk audit logredactSecrets runs before any log write; the pattern list is maintained incrementally
Hook input parsingCrash the hook with malformed JSONJSON.parse failures trigger a deny rather than a crash; platform adapters perform additional validation
Audit log path traversalCraft a session ID that writes outside the logs directoryThe session ID is sanitized to a filesystem-safe form, length-capped, and rejects . and ..
Network-level attacks, denial of service via resource exhaustion, and attacks on the agent platform itself are out of scope — CC Safety Net makes no network requests during command analysis.

Severity calibration

Findings are classified to keep the response proportional to impact:
  • Critical — a vulnerability that lets a destructive command bypass all analysis and execute (for example a shell-parser bypass for rm -rf /).
  • High — a vulnerability that weakens the boundary or allows partial bypass under specific conditions (for example a git-analysis bypass for unusual option ordering).
  • Medium — limited impact on the core function or secondary features (for example audit-log path traversal or a redaction bypass for a specific format).
  • Low — minimal security impact (for example information disclosure in diagnostic output).
Last modified on June 22, 2026