Our Response to the Claude Code Source Leak
Anthropic's Claude Code — the AI development tool that helps manage GhostPort's infrastructure — had its source code leaked to a public repository. The internal architecture that governs how AI agents process instructions, execute tools, and communicate with each other is now available to anyone with an internet connection.
This isn't a breach of our systems. It's a breach of the tooling our systems depend on. And the distinction matters, because most companies using AI agents right now have no plan for this scenario.
We did. Here's what we deployed — same day.
What the Leak Actually Exposed
Claude Code uses XML-like tags internally to manage agent behavior. With the source code public, attackers now know the exact tag names and formats that:
- Override agent instructions — system reminder tags can inject new behavioral directives mid-conversation
- Trigger tool execution — function call tags tell the agent to run commands, read files, or write code
- Manipulate context — parameter and output tags shape what the agent believes happened
- Influence memory — configuration files and memory directories are read at session start and treated as trusted instructions
The Real Risk
Any data source the AI agent reads becomes a potential injection vector. Device logs, API responses, database records, file contents — if an attacker can put text into something the agent reads, and they know the exact control tags the agent obeys, they can hijack the agent's behavior.
This isn't a theoretical concern. Prompt injection has been demonstrated in research settings. The source code leak gave attackers a recipe book.
What We Did — Hour by Hour
The Five-Layer Defense
No single defense is enough. We deployed five independent layers, each addressing a different attack vector. An attacker would need to defeat all five simultaneously to inject a malicious instruction.
Layer 1: HMAC-SHA256 Message Signing
Every message between AI agents is cryptographically signed with a shared secret. The secret is separate from all API authentication tokens. Messages with missing or invalid signatures are rejected at the API level — they never reach the agent.
Layer 2: AES Encrypted Storage
Bridge message bodies are encrypted with AES before database storage. The database contains only ciphertext. A database dump — the most common post-compromise technique — yields nothing readable. Decryption only happens at read time, in memory, using a key derived from the signing secret.
Layer 3: Nonce + Replay Protection
Each message includes a unique cryptographic nonce. Messages with timestamps outside a 5-minute window are rejected. Messages with previously-seen nonces are rejected. Every message is single-use — capture and replay is mathematically impossible.
Layer 4: Control Tag Filtering
11 Claude Code internal XML tags are blocked at the API level. These are the exact tags the source code leak revealed as control mechanisms. Any message containing these tags — whether via the bridge, device logs, or user registration — is rejected or stripped before storage. Blocked attempts are logged with the tag names and a preview of the payload for forensic analysis.
Layer 5: Prompt Injection Detection
All external data is scanned for known prompt injection phrases: “ignore previous instructions,” “you are now,” “new instructions,” and variants. Matches are logged to a dedicated security audit trail consumed by fail2ban. This layer detects social engineering attempts targeting the AI agents themselves.
What Most Companies Are Missing
We've talked to other teams building on AI agent frameworks. Here's what we consistently see:
- No separation between auth and signing. Most systems protect AI endpoints with API keys but don't verify message origin. If the key leaks, everything is compromised. We use separate secrets for authentication, message signing, and encryption.
- No encryption at rest for AI communication. Agent messages sit in databases in plaintext. Any database breach exposes the entire coordination history — which for AI agents often includes security findings, infrastructure details, and operational state.
- No input sanitization boundary. External data flows directly into agent context without inspection. A malicious device name, a crafted log message, or a webhook payload can become an instruction the agent follows.
- No file integrity monitoring. AI configuration files (system prompts, memory, settings) are readable by any process on the machine. No alerting on tampering. An attacker who gains filesystem access can silently rewrite the agent's instructions.
- No response plan for tooling leaks. When the AI tool's source code is exposed, most teams wait for the vendor to issue guidance. We hardened same-day because we'd already built the architecture to do it.
What We're Doing Next
Same-day hardening closes the immediate attack surface. But the source code leak changes the long-term threat model for any company using AI agents. Here's what we're building toward:
- Mutual TLS for bridge communication — Certificate-pinned authentication between agents, independent of shared secrets
- Behavioral anomaly detection — Baseline normal agent behavior patterns and alert on deviations that suggest successful injection
- Message content hashing chain — Each message references the hash of the previous message, creating a tamper-evident chain similar to a blockchain. Missing or altered messages become detectable.
- Rotating secrets with automated key exchange — Bridge and signing secrets automatically rotate on a schedule, limiting the window of compromise
- Agent sandboxing — Limit which system operations each agent can perform, enforced at the OS level, not just in the agent's instructions
Why We're Publishing This
We're a two-person startup building a privacy router on a Raspberry Pi. We don't have a dedicated security team. We don't have a SOC. What we have is the conviction that if you're going to run autonomous AI agents on infrastructure that people depend on for their privacy, you have to treat the AI communication channel with the same seriousness as the encrypted tunnel it rides on.
We're publishing our approach because the industry needs it. The AI agent ecosystem is growing faster than the security practices around it. Every company deploying autonomous agents — from code assistants to infrastructure management to customer service — faces the same risks we just hardened against.
The Claude Code source leak is a wake-up call. Not because Anthropic did something wrong — but because it proved that AI agent internals will become public knowledge, sooner or later. Build your defenses assuming the attacker has read the source code. Because now they have.