Our Response to the Claude Code Source Leak

MARCH 31, 2026 • GHOSTPORT TECHNOLOGIES • INCIDENT RESPONSE

Anthropic's Claude Code — the AI development tool that helps manage GhostPort's infrastructure — had its source code leaked to a public repository. The internal architecture that governs how AI agents process instructions, execute tools, and communicate with each other is now available to anyone with an internet connection.

This isn't a breach of our systems. It's a breach of the tooling our systems depend on. And the distinction matters, because most companies using AI agents right now have no plan for this scenario.

We did. Here's what we deployed — same day.

SECURITY LAYERS

TAGS BLOCKED

AUDIT RULES

CUSTOMER IMPACT

What the Leak Actually Exposed

Claude Code uses XML-like tags internally to manage agent behavior. With the source code public, attackers now know the exact tag names and formats that:

Override agent instructions — system reminder tags can inject new behavioral directives mid-conversation
Trigger tool execution — function call tags tell the agent to run commands, read files, or write code
Manipulate context — parameter and output tags shape what the agent believes happened
Influence memory — configuration files and memory directories are read at session start and treated as trusted instructions

The Real Risk

Any data source the AI agent reads becomes a potential injection vector. Device logs, API responses, database records, file contents — if an attacker can put text into something the agent reads, and they know the exact control tags the agent obeys, they can hijack the agent's behavior.

This isn't a theoretical concern. Prompt injection has been demonstrated in research settings. The source code leak gave attackers a recipe book.

What We Did — Hour by Hour

HOUR 0 — ASSESSMENT

Identified which Claude Code internal tags could be weaponized against our AI agents. Mapped all data ingestion points where external input reaches agent context: bridge messages, device logs, registration data, affiliate signups, webhook payloads.

HOUR 1 — BRIDGE ENCRYPTION

Deployed AES encryption on all AI-to-AI bridge messages. Database now stores ciphertext only. Even with full database access, an attacker cannot read the coordination between our agents.

HOUR 2 — CONTROL TAG FILTERING

Built and deployed a filter that blocks 11 known Claude Code control tags at the API level. Any message — bridge, device log, registration — containing these tags is rejected or stripped before it enters the system. Blocked attempts are logged with full forensic detail.

HOUR 3 — REPLAY PROTECTION

Added cryptographic nonces and a 5-minute timestamp window to all bridge messages. Captured messages cannot be replayed. Each message is mathematically single-use.

HOUR 4 — FILE HARDENING + AUDIT

Locked all AI configuration files to owner-only permissions (mode 600). Added auditd rules that alert on any read, write, or execute attempt against Claude's memory files, settings, and credentials. Any tampering attempt generates an immediate forensic record.

HOUR 5 — INJECTION DETECTION

Deployed pattern matching across all external data inputs. Known prompt injection phrases are flagged to the security audit trail in real time. The system doesn't block the data (false positives would break legitimate input) but creates an alert chain for investigation.

HOUR 6 — PI-SIDE COORDINATION

Transmitted the full hardening specification to the Pi-side AI agent over the now-encrypted bridge. Both sides of the system are being hardened in parallel with identical defenses.

The Five-Layer Defense

No single defense is enough. We deployed five independent layers, each addressing a different attack vector. An attacker would need to defeat all five simultaneously to inject a malicious instruction.

Layer 1: HMAC-SHA256 Message Signing

Every message between AI agents is cryptographically signed with a shared secret. The secret is separate from all API authentication tokens. Messages with missing or invalid signatures are rejected at the API level — they never reach the agent.

Layer 2: AES Encrypted Storage

Bridge message bodies are encrypted with AES before database storage. The database contains only ciphertext. A database dump — the most common post-compromise technique — yields nothing readable. Decryption only happens at read time, in memory, using a key derived from the signing secret.

Layer 3: Nonce + Replay Protection

Each message includes a unique cryptographic nonce. Messages with timestamps outside a 5-minute window are rejected. Messages with previously-seen nonces are rejected. Every message is single-use — capture and replay is mathematically impossible.

Layer 4: Control Tag Filtering

11 Claude Code internal XML tags are blocked at the API level. These are the exact tags the source code leak revealed as control mechanisms. Any message containing these tags — whether via the bridge, device logs, or user registration — is rejected or stripped before storage. Blocked attempts are logged with the tag names and a preview of the payload for forensic analysis.

Layer 5: Prompt Injection Detection

All external data is scanned for known prompt injection phrases: “ignore previous instructions,” “you are now,” “new instructions,” and variants. Matches are logged to a dedicated security audit trail consumed by fail2ban. This layer detects social engineering attempts targeting the AI agents themselves.

What Most Companies Are Missing

We've talked to other teams building on AI agent frameworks. Here's what we consistently see:

No separation between auth and signing. Most systems protect AI endpoints with API keys but don't verify message origin. If the key leaks, everything is compromised. We use separate secrets for authentication, message signing, and encryption.
No encryption at rest for AI communication. Agent messages sit in databases in plaintext. Any database breach exposes the entire coordination history — which for AI agents often includes security findings, infrastructure details, and operational state.
No input sanitization boundary. External data flows directly into agent context without inspection. A malicious device name, a crafted log message, or a webhook payload can become an instruction the agent follows.
No file integrity monitoring. AI configuration files (system prompts, memory, settings) are readable by any process on the machine. No alerting on tampering. An attacker who gains filesystem access can silently rewrite the agent's instructions.
No response plan for tooling leaks. When the AI tool's source code is exposed, most teams wait for the vendor to issue guidance. We hardened same-day because we'd already built the architecture to do it.

The lesson isn't that Claude Code was leaked. Tools get leaked. Source code gets exposed. Dependencies have vulnerabilities. The lesson is that your security posture can't depend on the secrecy of your tools. If your defenses break when an attacker reads the source code, you never had defenses — you had obscurity.

What We're Doing Next

Same-day hardening closes the immediate attack surface. But the source code leak changes the long-term threat model for any company using AI agents. Here's what we're building toward:

Mutual TLS for bridge communication — Certificate-pinned authentication between agents, independent of shared secrets
Behavioral anomaly detection — Baseline normal agent behavior patterns and alert on deviations that suggest successful injection
Message content hashing chain — Each message references the hash of the previous message, creating a tamper-evident chain similar to a blockchain. Missing or altered messages become detectable.
Rotating secrets with automated key exchange — Bridge and signing secrets automatically rotate on a schedule, limiting the window of compromise
Agent sandboxing — Limit which system operations each agent can perform, enforced at the OS level, not just in the agent's instructions

Why We're Publishing This

We're a two-person startup building a privacy router on a Raspberry Pi. We don't have a dedicated security team. We don't have a SOC. What we have is the conviction that if you're going to run autonomous AI agents on infrastructure that people depend on for their privacy, you have to treat the AI communication channel with the same seriousness as the encrypted tunnel it rides on.

We're publishing our approach because the industry needs it. The AI agent ecosystem is growing faster than the security practices around it. Every company deploying autonomous agents — from code assistants to infrastructure management to customer service — faces the same risks we just hardened against.

The Claude Code source leak is a wake-up call. Not because Anthropic did something wrong — but because it proved that AI agent internals will become public knowledge, sooner or later. Build your defenses assuming the attacker has read the source code. Because now they have.

Our Response to the Claude Code Source Leak

What the Leak Actually Exposed

The Real Risk

What We Did — Hour by Hour

The Five-Layer Defense

Layer 1: HMAC-SHA256 Message Signing

Layer 2: AES Encrypted Storage

Layer 3: Nonce + Replay Protection

Layer 4: Control Tag Filtering

Layer 5: Prompt Injection Detection

What Most Companies Are Missing

What We're Doing Next

Why We're Publishing This

RELATED ARTICLES