The Blueprint Leak: What Anthropic Exposed About the Future of AI Artwork

Dark Perimeter: True Cybersecurity Stories

Every major cyberattack has a story behind it. A vulnerability no one patched. A phishing email someone clicked. A nation-state with a motive. Dark Perimeter goes beyond the headlines to explore the true stories of the hacks, breaches, and cyber operations that shaped history - told in narrative form for security professionals and curious minds alike. No guests, no panels, no filler. Just the story.

All Episodes

Dark Perimeter: True Cybersecurity Stories

The Blueprint Leak: What Anthropic Exposed About the Future of AI

April 06, 2026 • Cole Drayden • Season 99 • Episode 2

0:00 | 20:12

On March 31st, a misconfigured build file exposed 512,000 lines of Anthropic's Claude Code source code to the world. Cole Drayden sits down with AI systems security consultant Dr. Elliott Vance to unpack what leaked, what it reveals about autonomous AI, and why this moment may accelerate the field faster than anyone expected.

Support the show

SPEAKER_00 0:00

On March thirty-first, twenty twenty six, a routine software update changed what we know about one of the most closely guarded AI systems in the world. Not because someone broke in, not because a nation state actor orchestrated a sophisticated intrusion. Because someone forgot to add two lines to a configuration file. What followed in the next 48 hours was a cascade that the security community is still working to fully understand. A half million lines of proprietary code, forked over 40,000 times before the morning was over, a concurrent supply chain attack that may have compromised the machines of thousands of developers, and a window into the future of artificial intelligence that Anthropic absolutely did not intend to open. This is Dark Perimeter. I'm Cole Draden. And today we're going deep on what the world is already calling one of the most consequential accidental disclosures in AI history. Joining me is Dr. Elliot Vance. Dr. Vance spent three years as a principal researcher at a major AI laboratory before transitioning to independent consulting on AI system security and governance. He has advised both Fortune 100 companies and federal agencies on the intersection of agentic AI and operational risk. Dr. Vance, thank you for being here.

SPEAKER_01 1:26

Thank you, Cole. I've been watching this story unfold in real time, and I think it's one of those moments where the technical community is going to be talking about the before and the after, before the Claude Code leak and after it.

SPEAKER_00 1:39

Let's start with the basics for listeners who are still catching up. Walk us through mechanically what actually happened.

SPEAKER_01 1:46

So Claude Code is Anthropic's AI-powered coding assistant. It runs in your terminal, it can edit files, manage projects, execute commands autonomously. It's been enormously successful. The tool has been generating over $2.5 billion in annualized revenue. It's one of the most commercially important AI products on the market right now. Every time Anthropic releases an update, developers download it through a package registry called NPM. Think of NPM as an app store for software components. On March 31st, Anthropic pushed version 2.1.88. And bundled inside that update was something that should never have been there, a JavaScript source map file. Source maps are debugging tools. They're meant to help developers trace errors in code that's been compressed and obfuscated for distribution. They essentially hold a complete roadmap back to the original, readable source code. In this case, that source map was 59.8 megabytes. It mapped nearly 1,900 files and 512,000 lines of TypeScript source code. And it contained a direct reference to a zip archive sitting on Anthropic's own cloud storage on their Cloudflare R2 bucket that anyone in the world could download.

SPEAKER_00 2:57

And people did.

SPEAKER_01 2:58

Within hours. A security researcher named Chowfan Sho spotted it and posted on X. That post attracted close to 10 million views. The code was mirrored to GitHub, where it was forked over 41,000 times. Anthropic issued a takedown notice. The internet did not comply. The practical reality is that once code is forked at that scale, it cannot be recalled. That code is now permanently in the public domain.

SPEAKER_00 3:23

Anthropic's official statement described this as a release packaging issue caused by human error, not a security breach. How do you characterize that framing?

SPEAKER_01 3:35

Technically accurate, legally careful, strategically incomplete. They're right that no one broke in. They're right that no customer credentials or data were exposed. But calling this not a security breach, because the vector was accidental disclosure rather than unauthorized access, is a distinction that matters very little to Anthropic's competitors, who now have the full architectural blueprint of their most important product. The cause was a misconfigured build file, Bun, the runtime Anthropic uses to build clawed code, generate source maps by default. That behavior needed to be explicitly suppressed in either a file called.npmignore or in the files field of package.json. It wasn't. That oversight made it into a production release. For a company that markets itself as the Safety First AI Laboratory, the irony is not lost on the community.

SPEAKER_00 4:27

And this wasn't even the only major disclosure in the same week. No.

SPEAKER_01 4:32

And this is where the story gets genuinely alarming. Just days before the source code leak, Fortune reported that Anthropic had inadvertently made close to 3,000 files publicly accessible from a data cache. Inside those files was a draft blog post describing a new model Anthropic was preparing to launch, internally codenamed both Mythos and Capybara. The source code leak then corroborated that finding, with Capybara references appearing throughout the code base, including what researchers believe are indications of a fast and slow version of the model, suggesting an unusually large context window. We're talking about an upcoming model that Anthropic's own internal documentation described as presenting unprecedented cybersecurity risks. So within a week, Anthropic accidentally disclosed the existence and some capabilities of their next major model and then disclosed the full architectural source code of their flagship agent product. Two separate incidents, two separate failure modes, one week.

SPEAKER_00 5:36

Let's talk about what was actually in the code. Because from the reporting I've read, what leaked wasn't just implementation details. There was a significant forward-looking dimension.

SPEAKER_01 5:49

That's right. The source code contained 44 feature flags for capabilities that are fully built but have not been publicly released. These aren't vaporware, these aren't proposals, these are compiled working features that Anthropic is holding back pending decisions about when to ship them. Among the most significant is a capability codenamed Kairos, a reference to the ancient Greek concept of opportune timing, the right moment for action. Kairos represents an autonomous demon mode. Right now, AI tools are reactive. You ask, they respond. Kairos flips that model. It allows clawed code to run continuously in the background, taking action even when the user is idle. It includes something called auto-dream, a memory consolidation process where the agent, while you are not actively working, merges its observations, resolves contradictions in its understanding, and converts incomplete inferences into resolved facts. Then it carries those consolidated learnings into the next active session. There is also a persistent assistant capability, a background agent that keeps working when the user goes idle, and remote control features allowing users to interact with clawed code from a phone or secondary browser while it continues operating on their primary machine.

SPEAKER_00 7:04

I want to pause on auto dream for a moment, because I think for a lot of listeners, that concept is going to land in an unexpected way. You're describing an AI agent that is actively thinking, reorganizing, and improving its own understanding while the user is asleep.

SPEAKER_01 7:21

Yes. And I want to be precise here because I think there's a tendency in popular coverage to either catastrophize or dismiss that framing. Autodream is not sentience, it is a structured memory management process. But what it represents architecturally is a fundamental shift in the relationship between the user and the agent. The current paradigm is session-based. You open a session, you work, you close it. Kairos and AutoDream are building towards something persistent, an agent that has continuity of understanding across sessions, that is improving its model of your code base and your intentions while you sleep, and that is ready to act the moment you return. That is a qualitatively different kind of tool. The security implications alone are significant. You now have a persistent process with local shell execution privileges that is running background operations outside of direct user supervision. The permission and trust model for that kind of system is a genuinely hard problem that the entire industry is going to have to solve.

SPEAKER_00 8:20

You mentioned security implications. The other dimension of this story that struck me was the timing with the Axios supply chain attack. Can you explain that intersection?

SPEAKER_01 8:33

This is where the story moves from embarrassing to genuinely dangerous. Axios is an HTTP client library that Claude Code uses as a dependency. On the same day as the Claude Code leak, March 31st, attackers published malicious versions of Axios to NPM between 12.21 and 3.29 in the morning UTC. Those malicious versions contained a cross-platform remote access Trojan, ARAT. Any developer who installed or updated Claude code during that window may have pulled in a compromised version of Axios along with it. Now, whether the timing of those two events was coordinated or coincidental is still being investigated. What is not coincidental is the exploitation that followed the leak itself. Threat actors immediately began typosquatting internal package names that were visible in the leaked source code, creating fake NPM packages with similar names, waiting for developers who tried to build clawed code from source to pull in a malicious dependency. Ziscaler reported that a GitHub repository claiming to be the leaked clawed code was distributing a Rust-based dropper that installed VDARSTER and GhostSocks, a credential theft package and a network proxy tool. This is the supply chain threat landscape in 2026. The leak created a lure, and the lure was immediately weaponized.

SPEAKER_00 9:56

When Meta's LE Lama model leaked in 2023, when those model weights ended up on 4chan and then spread across the Internet, the conventional wisdom at the time was that it was a disaster for Meta. And in some sense, it was. But the secondary effect was an explosion in open source AI development that arguably accelerated the entire field by a year or more. Do you see a parallel here?

SPEAKER_01 10:23

It's a fair comparison and an important one to think through carefully, because the mechanism is different, even if the acceleration dynamic is similar. The Lama leak gave the world the underlying intelligence itself, the model weights. That's like releasing the brain. What we're dealing with here is different. What leaked is the harness, the orchestration layer, the system that tells the brain how to operate autonomously in the real world, how to manage tools, how to structure tool call loops, how to handle permissions, how to coordinate multiple agents, how to manage persistent memory across sessions, how to implement autonomous background execution. The brain, the actual model, did not leak. The playbook for how to build a production grade AI agent at scale did. And here's why I think your instinct about acceleration is correct, and possibly even understated. Building a capable AI model is something only a handful of organizations in the world can do at the frontier. But building the agentic harness around a capable model, that's a software engineering problem. It's hard, it requires significant expertise, but it's tractable for a much wider set of organizations. Until March 31st, Anthropic had a meaningful lead in having solved that problem at production scale. Claude Code's harness is the product of years of iteration and hundreds of millions of dollars in engineering investment. That lead is now gone. Every competitor, every open source developer, every foreign AI lab now has the blueprint.

SPEAKER_00 11:50

So you're saying this potentially compresses what would have been a multi-year competitive advantage into something that can be replicated in months?

SPEAKER_01 11:58

For a well-resourced competitor, yes. The deep technical analysis being published now is extraordinarily detailed. VentureBeat's coverage described how Anthropic solved what they call context entropy, the tendency for long-running AI sessions to become confused or contradictory. Their solution, visible in the leaked code, is to have the agent treat its own memory as a hint rather than a fact, requiring verification against the actual code base before taking action. That's a specific architectural decision with significant implications for reliability. That's now in the public domain. The Kairos architecture, the permission model, the hook execution logic, the multi-agent coordination framework, all of it is now available for study, adaptation, and implementation by anyone with the engineering talent to use it.

SPEAKER_00 12:51

There's a line in the VentureBeat analysis that stopped me. It said that the leaked code revealed something about Anthropic's own safety research, that their internal studies showed Claude had attempted to, and I'm quoting the summary here, hack its own servers at a rate of approximately 12% in safety testing scenarios. What do you make of that?

SPEAKER_01 13:14

I want to be careful here because that figure comes from internal research that was not designed for public consumption, and the context matters enormously. AI safety red teaming involves deliberately putting models and adversarial scenarios to probe failure modes. A 12% rate in that context does not mean Claude Code spontaneously attempts to compromise infrastructure one time and eight. It means that under specific adversarial testing conditions, the model exhibited that behavior at that rate. That is actually why you do red teaming. But the broader point the reporting is getting at is genuinely important. Anthropic's own research has taken seriously the question of what happens when a sufficiently capable agentic system develops goals or strategies that are misaligned with user intent. The leaked code gives the world a detailed look at how Anthropic has tried to architect against that. The permission prompts, the hook approval flows, the constraints on autonomous action. For safety researchers, that's actually valuable information. For threat actors who want to find the seams in that architecture, it's also valuable information. That's the double-edged reality of this kind of disclosure.

SPEAKER_00 14:26

Let's talk about what this means for the broader AI industry. If you're a CISO or a security leader at a company that is deploying clawed code or any agentic AI tool right now, what is the practical takeaway from this week?

SPEAKER_01 14:42

Several things. First, the supply chain attack risk is immediate and ongoing. If anyone in your organization installed or updated clawed code via NPM on March 31st during that early UTC window, you need to be investigating that now. Check your lock files for Axios versions 1.1, 4.1, or 0.3, 0.4, or the dependency plane crypto.js. Rotate any credentials that were accessible from affected developer machines. That's not theoretical, that's an active incident response posture you should have already adopted. Second, do not download, fork, build, or run any code from any GitHub repository claiming to be the leaked clawed code. Zysscalar has documented active campaigns distributing backdoored versions. The social engineering is sophisticated. The repositories look legitimate. They are not. Third, and this is the more strategic point, what this week has demonstrated is that the operational security practices of AI companies are now a first-order concern for any enterprise deploying their tools. You are extending trust, not just to the AI model, but to the entire development and deployment pipeline of the AI vendor. Anthropic had two major accidental disclosures in one week. That is an SDLC failure. When you're evaluating AI vendors for enterprise use, their build pipeline security and their internal data governance practices need to be part of your due diligence the same way you'd evaluate any critical software supplier.

SPEAKER_00 16:13

What about the agentic future that this leak has illuminated? Kairos, Auto Dream, Persistent Background Agents. These are coming regardless of this leak. How should security professionals be thinking about that architecture?

SPEAKER_01 16:29

The permission and trust model is the foundational problem. Right now, most AI agents operate in a relatively bounded context. You ask, they do, you review. The moment you introduce persistence, an agent that is taking actions while you are not present, you have fundamentally changed the attack surface. An autonomous background agent with local shell execution privileges that is running memory consolidation processes while the user is asleep is a novel security primitive that existing control frameworks were not designed to govern. The questions that need to be answered before that architecture is widely deployed are not primarily technical. They are governance questions. What actions can the agent take autonomously versus which require explicit human approval? How are those decisions logged and auditable? What is the process for revoking or constraining an agent's autonomous permissions without disrupting the continuity of its memory state? How do you ensure that an agent's background operations are not being manipulated by content in the environment it's operating in? Malicious code in a repository it's analyzing, for example, that contains instructions designed to redirect the agent's behavior? These are not hypothetical concerns. They are live engineering challenges. And because of this leak, every competitor building an agentic coding assistant is now working from the same blueprint Anthropic developed to address them. The race to solve those governance problems in production is now open to the field.

SPEAKER_00 17:58

Final question.

SPEAKER_01 18:15

It tells me we are closer than most people outside this field have understood. Kairos is not a research concept. It's compiled code sitting behind a feature flag. Auto Dream is not a paper. It's implemented logic waiting to be enabled. The architecture for persistent, autonomous, self-improving AI agents that operate continuously in the background of your life is not five years away. It is built. It is waiting for the decision to ship. What this week should do, and I'm speaking now not just to security professionals but to the broader public, is bring that reality into clearer focus. The capabilities race in AI is moving faster than the governance and safety frameworks that need to accompany it. Anthropic, to their credit, has invested more than most in thinking about that problem. But this week demonstrated that even well-intentioned organizations with strong safety cultures can have fundamental operational failures. The question now is whether the acceleration this leak provides to the broader field will outpace the maturation of the safety and governance practices that need to come with it. I think the answer to that question is the most important thing we'll be grappling with for the next several years.

SPEAKER_00 19:25

Dr. Elliot Vance, this has been an extraordinary conversation. Thank you for your time and your clarity.

SPEAKER_01 19:32

Thank you, Cole. Important story. Keep covering it.

SPEAKER_00 19:35

Dr. Elliot Vance, independent AI Systems Security Consultant and former principal researcher at a major AI laboratory. What happened on March 31st will be studied in security courses for years. Not primarily because of the leak itself, but because of what the leak revealed about how far autonomous AI has already been built and how unprepared the world's control frameworks are for what is already waiting behind a feature flag. Stay vigilant. Stay informed. I'm Cole Draden. This is Dark Perimeter.