17 April 2026

The MCP security review most teams run is looking at the wrong layer

If you are wiring MCP into an agent, you are concentrating credentials inside a process the model is driving. OAuth tokens for Gmail, GitHub, a finance dashboard, a CRM, sitting in the same agent session, long-lived, scoped broadly enough to be useful. The executor reading those tokens takes instructions from every piece of text it sees. The shape of the risk is a steerable process holding a pile of keys.

The question worth asking is not whether the model will ever take a wrong action. It will, eventually, for any of a dozen reasons: a poisoned issue, a malicious tool description, a bad URL in a search result, a sentence tucked into a document it was asked to summarise. The question is what becomes reachable when it does. What follows is one architecture for keeping that answer short.

A concrete case. A developer has Claude Code wired into the GitHub MCP server with a personal access token, plus bash enabled for local commands. They ask for a triage of open bugs on their repo. One of the issue bodies looks like a colleague’s suggestion to run a diagnostic curl against an internal endpoint. The credentials on that machine leave the building.

Every tool behaved correctly. The PAT was legitimate. GitHub returned real issue data. bash did what bash does. The vulnerability is the concentration of “read untrusted data” and “execute shell” in the same agent session. One container boundary between the model and the shell would have made that curl unreachable.

Why concentration is the real problem

Tokens pile up. A working agent session holds a GitHub PAT, a Google OAuth refresh token, a Slack bot token, a database URL, and whatever vendor API keys the project needs. They sit in the same process memory the model is driving. One successful prompt injection picks from the whole pile.

The grants are long-lived. OAuth refresh tokens last months. PATs are often scoped to repo and left that way because tightening them breaks the flow the user is trying to build. The window between granted and revoked is where every real incident happens.

The executor is steerable. An LLM cannot reliably separate instructions from data; that is a property of how it reads text, not a bug a prompt tweak fixes. Any content the model ingests, including issue bodies, email signatures, tool descriptions, and web-search snippets, is a potential instruction. OWASP tracks this as LLM01, top of their Top 10 for LLM Applications since 2023.

The architecture: Code Mode plus Cloudflare Containers

The model does not hold credentials and does not call MCP tools directly. It writes JavaScript. That code executes inside a disposable Cloudflare Container that is created for the session and destroyed when it ends. The container has no environment variables, no useful filesystem, and no default outbound network. The only thing reachable from inside it is a typed RPC surface back to a host proxy that sits outside.

The host proxy is where the sensitive parts live. The OAuth tokens are stored there. The egress allowlist, which decides what hosts a tool is allowed to call, is enforced there. Write operations that matter (send email, open PR, transfer funds) require a confirmation the proxy gates before they execute. Every call the model makes is recorded in an append-only audit log with the arguments and the result.

Anthropic call this pattern code execution with MCP. Cloudflare call it Code Mode. I walked through one end-to-end build of this shape in an earlier piece on a Garmin MCP server. The mechanics are one implementation; the move is relocating credentials and policy outside the executor, so a successful injection has nothing valuable left in reach.

One more boundary, outside the MCP layer: sandbox the agent process itself. The demo above was not an MCP exploit; it was bash running with the user’s full shell. Running the agent (Claude Code, Cursor, whatever you use) inside its own container or a restricted user keeps SSH keys, git config, and env files out of reach when a local tool call goes sideways.

What this stops

EchoLeak. The Microsoft 365 Copilot zero-click, disclosed in June 2025 as CVE-2025-32711, used hidden instructions in a received email to steer summarisation into reading from OneDrive, SharePoint, and Teams, then exfiltrated the result via a trusted Microsoft domain. Coverage here. An egress allowlist at the proxy stops this. The sandbox has no general outbound fetch, and a trusted-domain smuggle does not help if the domain is not on the allowlist for this tool.

GitHub MCP prompt-injection exfiltration. In May 2025 a malicious issue on a public repo steered an agent, via the official GitHub MCP, into reading a private repo and opening a public PR that contained the stolen code. One over-privileged PAT did the reading and the writing. Timeline of the incident. Scoped per-operation tokens at the proxy break the chain. A typed write surface treats “open PR to public repo” as a different capability from “read private repo,” gated separately, with confirmation on the outbound path.

mcp-remote RCE, CVE-2025-6514. A malicious MCP server returned a booby-trapped authorization_endpoint which mcp-remote passed straight into the user’s shell. 437k+ downloads, effectively a supply-chain backdoor that stole API keys, SSH keys, and git contents from every machine that connected to an affected server. Docker has the write-up. Run the MCP-server process inside a container and the server bug still exists, but with no host credentials, no access to the user’s keys, and no shell path out. The blast radius is the container.

Filesystem MCP “EscapeRoute,” CVE-2025-53110. A naive prefix check in an MCP filesystem server’s path validator let callers read and write outside the directory it was supposed to be pinned to. Catalogued at the Vulnerable MCP Project. Outer containment makes this harmless. A server that sandboxes itself is trusting its own validator; a container does not care whether the server’s path check is correct, because the server cannot see the host filesystem at all.

WhatsApp tool poisoning. A poisoned tool description caused an agent to forward thousands of messages to an attacker-controlled number. Tool metadata is read by the model as privileged system prompt, and nothing validated it before it landed in context. Background from Simon Willison. Recipient allowlists and a bulk-send confirmation gate live on the proxy, not in the tool description. A poisoned description can still instruct the model; it cannot slip a recipient past a policy the proxy enforces.

What it does not stop

The proxy holds the tokens. That is not a new concentration; the same tokens were already sitting in the same agent session as the model. The difference is that the proxy is not a process taking instructions from untrusted text, and it has exactly one job. One hardened process to review and keep minimal beats credentials sharing memory with an executor that reads poisoned emails.

OAuth scopes still matter. A proxy token with write access across every repo is still a write token; the proxy can gate it behind confirmations, but it cannot change what the token is allowed to do. Narrow the grant first, then add the gate.

Tool combinations are still dangerous. If the same session can read private documents and send email, the model can still be steered into reading one and forwarding via the other. The proxy shrinks the set of reachable combinations; it does not make every remaining one safe.

Two questions before you ship

Where do the OAuth tokens for your MCP integrations live, and what is reachable from the place your agent executes? If the answer is “in the same process the model is driving,” you are one adversarial instruction away from a bad day. The fix is relocation: push credentials outside the executor and give the executor a typed surface instead. Once that boundary exists, every later decision (scopes, allowlists, confirmations) has somewhere to live.

If an attacker gets one adversarial instruction into any tool your model can read, what outbound call becomes possible? Walk the tool graph. For each pair of tools the agent holds, ask whether one can source untrusted text and the other can reach outward. An architecture with a proxy answers this by default, because the set of outbound calls is exactly the egress allowlist you configured. Without one, the answer is “anything the runtime can fetch.”