Dark Perimeter: True Cybersecurity Stories

The State of AI: Where It Is, Where It's Going, and What Tomorrow Looks Like

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 15:23
AI is no longer a question mark. The models are real, the adoption numbers are real, and the productivity gains in narrow domains are real. What is also real: hallucination rates between 15 and 70 percent depending on the task, AI agents stuck at 17 percent actual deployment despite 60 percent of organizations planning to use them, and a threat landscape where attackers are already running circles around defenders using the same tools everyone else is still evaluating. In this episode, Dr. Elliott Vance and Marcus Hale take a clear-eyed look at where AI actually stands in 2026. What the Stanford AI Index tells us. Why benchmark performance does not translate cleanly to production. What the Arup incident, a 25.6 million dollar wire transfer executed after a fully AI-generated video call, tells us about where social engineering is heading. And why the most dangerous scenario for security teams may not be AGI, but the long, grinding middle period where AI capability keeps climbing while reliability keeps lagging. Topics covered: AI capability vs. hype in 2026. Hallucination rates and architectural limits. AI agents: the gap between demo and deployment. Offensive AI, breakout times, behavioral phishing, and deepfake-as-a-service. Data poisoning and AI supply chain integrity. Agentic SOC platforms and the risk of prompt-injecting your own defenses. The AGI debate: Amodei, Altman, LeCun, and what Hinton's revised timeline actually signals. Practical guidance for security programs operating in this environment.

Support the show

SPEAKER_00

Two years ago, the conversation was about whether AI was real, whether the benchmarks meant anything, whether the hype cycle would collapse the way every previous one did.

SPEAKER_01

It didn't collapse.

SPEAKER_00

It didn't collapse. But it also didn't deliver what people said it would, at least not on the timeline they said it would. And I think we're at a point now where it's worth stepping back and asking the harder question. Not is AI real, but which parts of it are real and which parts are still a story we're telling ourselves.

SPEAKER_01

The story's doing a lot of work right now, especially in security.

SPEAKER_00

Let's start with what's actually working because I think the skeptics sometimes overcorrect. The Stanford AI Index came out last month, and the numbers are genuinely interesting. Top models are now clearing 50% on expert-level benchmark questions across medicine, law, mathematics. Three years ago that number was closer to 15.

SPEAKER_01

Benchmarks aren't operations.

SPEAKER_00

No, they're not. But they're not nothing either. And the adoption numbers tell a similar story. Somewhere around 88% of organizations are using AI in some form. That's not pilot programs. That's deployed in workflow.

SPEAKER_01

Using it and relying on it are different things. I can use a compass and still get lost.

SPEAKER_00

That's exactly the distinction I want to push on, because the productivity gains are real in narrow domains. Code generation, alert triage, first pass document review, customer-facing automation. The data is consistent across those areas. But the moment you move outside the narrow domain, the performance curve drops off a cliff.

SPEAKER_01

And nobody tells you where the cliff is.

SPEAKER_00

Which is one of the central problems. The model doesn't know what it doesn't know. It doesn't experience uncertainty the way a human analyst does. It produces output with the same confident tone, whether it's right or it's completely fabricated.

SPEAKER_01

Hallucination rates?

SPEAKER_00

Still significant. 15 to 50 percent on general tasks, depending on the model and the methodology. And in high-stakes domains, legal, medical, anything requiring precise citation, the numbers climb. Some studies are putting global hallucination rates above 70% for certain legal query types.

SPEAKER_01

Which means if you're using AI to research case law or interpret a contract without a human reviewing the output, you are operating on fiction some percentage of the time.

SPEAKER_00

And the percentage is not small enough to ignore. The underlying architecture hasn't solved this. These models are trained to produce the most statistically likely next token. They're not assessing their own confidence. They're not flagging uncertainty. They're completing the pattern. And the organizations that are doing this well have built review workflows that treat AI output as a draft, not a deliverable.

SPEAKER_01

Which slows things down, which is the opposite of the pitch.

SPEAKER_00

Right. The pitch is speed. The reality is that you need a human in the loop to catch the errors, which puts you back on the human's timeline. The value is still real. You're getting a faster first pass, better coverage across large data volumes. But it's not autonomous, not reliably.

SPEAKER_01

What about agents? Because that's where the hype is right now.

SPEAKER_00

AI agents are at what Gartner is calling the peak of inflated expectations. Something like 60% of organizations are planning to deploy them within two years. The number that have actually deployed them to date is around 17%. And of that 17% narrow scope, supervised, nothing like the demos. Most of them are doing single-task automation in a constrained environment. The fully autonomous agent that plans, executes, adapts, and closes a loop without human authorization, that's not what's shipping.

SPEAKER_01

I've seen the demos. They're impressive. They're also staging environments.

SPEAKER_00

The benchmark versus real-world gap is one of the most important things to understand right now. AI is rarely tested the way it's actually used. The conditions are cleaner, the tasks are better defined, the failure modes aren't present. You put it in a production environment with messy data and ambiguous inputs, and the performance degrades in ways that don't show up in the literature.

SPEAKER_01

Alright, let's talk about offense because that's where I think the gap between hype and reality is narrowest. The attackers are not waiting for the enterprise to figure this out.

SPEAKER_00

No, and the numbers are striking. Average e-crime breakout time, the window from initial access to lateral movement across the network dropped to 29 minutes last year. That's a 65% increase in speed from the year before.

SPEAKER_01

29 minutes is nothing. If you don't have automated detection and you're relying on a human to catch the initial alert and escalate it, they're already inside before you've opened your ticketing system.

SPEAKER_00

And AI is compressing that timeline on the attacker side. Automated reconnaissance, rapid vulnerability identification. Exploit chain construction that would have taken a skilled human operator hours is now being done programmatically.

SPEAKER_01

We're not talking about mass blast campaigns with bad grammar anymore. The new generation is behavioral. They're pulling data from LinkedIn, from public social media, from breach databases, from company websites. They know who you report to, what projects you're working on, what your communication style looks like. The email arrives and it reads like something your actual manager would write.

SPEAKER_00

And increasingly it's not email, it's voice, it's video.

SPEAKER_01

The RUP incident is the one I keep coming back to. A finance employee at a major engineering firm was on a video call. The call appeared to include the CFO and several other company executives. Everyone on the call was AI generated. Every participant, not just the CFO, every other face in the room, their voices, their mannerisms. The employee wired $25.5 million.

SPEAKER_00

That's the thing about deepfakes that gets underappreciated. It's not just the impersonation of the primary target, it's the social proof around them. Humans are wired to trust consensus. If ten people on a call are all telling you the same thing, the social pressure to comply is enormous. And if all ten are fabricated, that pressure is manufactured from nothing.

SPEAKER_01

85% of organizations reported at least one deepfake-related incident in the past year. That's not a niche threat anymore.

SPEAKER_00

The infrastructure for this is now a service. Deepfake is a service platform that lower the technical bar to near zero. You don't need a machine learning background. You need a credit card.

SPEAKER_01

Nation state actors are combining this with long-term infrastructure targeting energy grids, healthcare systems, financial networks, and the timing isn't random. There are specific windows, FIFA this year, the Winter Olympics, the U.S. midterms, where the noise floor is higher and the targeting opportunities increase, defenders are stretched, attention is split, and the attackers know the calendar.

SPEAKER_00

The new frontier that I find particularly troubling is data poisoning. Instead of attacking the AI system's outputs, you attack the training data upstream. You corrupt the data set that the model is trained on, which means the model learns the wrong things, and you don't necessarily know what's happened until the model's making decisions in production.

SPEAKER_01

And that's a hard problem because by the time you see the anomalous behavior, the model has been deployed, it's been trusted, it may have been making decisions for months.

SPEAKER_00

The integrity of training data is a security problem that most organizations haven't operationalized yet. They're thinking about model output security. They're not thinking about supply chain integrity for the training pipeline.

SPEAKER_01

Let's flip to defense because I don't want to leave the impression that it's all running in one direction.

SPEAKER_00

The adoption numbers on the defensive side are actually higher than I expected. Around 95% of organizations say they're deploying AI in some security workflow. Threat detection, alert triage, incident response.

SPEAKER_01

The triage use case is real. The volume of alerts that a modern SOC has to process is not humanly manageable. If AI can close 60 or 70% of the noise, the obvious false positives, the known good signatures, the duplicate alerts, you're freeing analysts for the investigations that actually require judgment.

SPEAKER_00

The agentic SOC platforms are starting to push beyond triage into active response. Quarantining hosts, isolating sessions, executing containment actions based on real-time risk assessment. Some of them are doing this without waiting for human authorization.

SPEAKER_01

Which makes me nervous in a different way.

SPEAKER_00

Tell me.

SPEAKER_01

An autonomous system that can quarantine infrastructure is also an autonomous system that an attacker can manipulate into quarantining the wrong things. If I can feed false signals to your detection engine, I can make your AI take your own systems down for you. That's not hypothetical. That's a logical extension of how these systems work.

SPEAKER_00

Adversarial inputs. You're essentially prompt injecting the security platform.

SPEAKER_01

And the platforms aren't uniformly robust against that. The ones that are doing this well have kept humans in the authorization loop for high consequence actions. The ones that are fully autonomous for cost reduction reasons are carrying a risk they may not have fully modeled.

SPEAKER_00

The asymmetry problem on the defensive side is real, and I don't think it's solvable in the near term. Attackers iterate fast. They have no compliance requirements, no change management process, no liability for false positives. Defenders are operating under all of those constraints simultaneously.

SPEAKER_01

The organizations I've seen navigate this best, and there are a handful, they're not the ones with the most sophisticated AI. They're the ones who have done the harder work of mapping which problems AI is actually solving versus which problems it's performing the appearance of solving.

SPEAKER_00

That distinction matters more than the tooling.

SPEAKER_01

It does, because if you buy an AI security platform and you don't change the underlying process, you've just added complexity. The AI is doing something, but it's not integrated into how decisions get made. It's decorative.

SPEAKER_00

There's a version of this conversation we haven't had yet, which is where all of this is heading. And I want to be careful about how I frame this because the expert disagreement here is wider than most people realize.

SPEAKER_01

And then you have Jan Lakun, who arguably knows as much about this as anyone alive, saying current architectures fundamentally cannot get there, that large language models are a dead end for general intelligence, that we need something structurally different.

SPEAKER_00

Jeffrey Hinton is the one I find most interesting to watch. A few years ago he was saying 30 to 50 years to AGI. He has since revised that to five to twenty years. The fact that someone with his depth of understanding moved that much in that direction is worth paying attention to.

SPEAKER_01

And on the defensive side, presumably the same capability is available.

SPEAKER_00

Presumably. But there's the asymmetry again. Attack has the initiative, defense has to cover everything, and if the system on the offensive side is more capable, even marginally, the defender's position degrades over time.

SPEAKER_01

What does the skeptical camp say? If Lacun is right, if we hit architectural limits before we get there?

SPEAKER_00

We're still looking at a sustained period of AI systems that are capable enough to transform threat actors' operations without being capable enough to fully transform defense. The incremental capability gains keep going. The reliability gaps keep being exploited. And we're in this extended middle period where AI is powerful enough to change everything about how attacks are executed, but not reliable enough to be fully trusted in automated defense.

SPEAKER_01

Which is actually the harder scenario to plan for.

SPEAKER_00

Why do you say that?

SPEAKER_01

Because if AGI arrives, everything changes and everyone knows it. You adapt or you don't. But the slow grind, capability keeps increasing, reliability keeps lagging, the threat surface keeps expanding, and you never get a clean moment where you can say this is the new normal. That's operationally exhausting. You never get to stop adjusting.

SPEAKER_00

The organizations that are positioning well for that scenario are building institutional knowledge about AI's failure modes rather than just its capabilities. They know where it hallucinates. They know what inputs make it unreliable. They know which decisions to hand off and which to keep human-centered.

SPEAKER_01

They're treating it like a junior analyst.

SPEAKER_00

That's exactly the right frame. Fast, useful, broad coverage, occasionally wrong, sometimes confidently wrong, and never the last word on anything that matters.

SPEAKER_01

The problem is that the vendors aren't selling it that way.

SPEAKER_00

No, they're selling certainty. And certainty is the wrong relationship to have with a system that produces statistically likely outputs and calls them facts.

SPEAKER_01

If you're running a security program right now and someone asks you what to do with all of this, what's the actual answer?

SPEAKER_00

Narrow your use cases to where the model's failure modes don't matter. Alert triage. A false positive from the AI costs you time, not data. Threat intelligence summarization, first pass only, human reviews, the output, code review assistance, useful, but the human signs off. Don't automate decisions where the cost of a hallucination is high.

SPEAKER_01

And on the threat side.

SPEAKER_00

Assume deepfakes. Assume that voice and video are no longer reliable identity signals. Build verification into your processes that doesn't depend on recognizing someone. Out-of-band confirmation for anything involving money, credentials, or access.

SPEAKER_01

Two-person integrity for high-value transactions through channels that weren't introduced in the session you're trying to verify.

SPEAKER_00

And watch your training data if you're using AI in production. The supply chain for AI is a supply chain. It has the same integrity requirements as everything else.

SPEAKER_01

Tomorrow doesn't look like science fiction. It looks like this, but more so, the capability curve keeps climbing, the reliability gap stays uncomfortable, the attackers keep moving faster than the defenders can institutionalize. And the question isn't whether AI is real. That argument is over. The question is whether the organizations defending against it are developing the discipline to know when to trust it and when not to.

SPEAKER_00

And the answer to that question right now is that most of them are not. Not because they can't, because it's harder than buying the platform.

SPEAKER_01

It's always harder than buying the platform.

SPEAKER_00

It is. Good night.