AttractivePenguin

Posted on Apr 16

Cybersecurity Looks Like Proof of Work Now — And That Should Scare You

#ai #devops #security #programming

Cybersecurity Looks Like Proof of Work Now — And That Should Scare You

There's a chart making the rounds from the UK's AI Security Institute that I can't stop thinking about.

It shows different AI models attempting a 32-step corporate network attack simulation — reconnaissance through full network takeover, estimated at 20 hours of human effort. Most models, given a 100 million token budget, failed to complete it. Claude Mythos, Anthropic's not-publicly-released security model, completed it in 3 out of 10 attempts.

One hundred million tokens. Per attempt. At current pricing, that's ~$12,500 a run, $125k for the full batch. And here's the kicker: none of the models showed signs of diminishing returns. Keep throwing tokens at it, keep finding exploits.

That detail should keep every security team awake at night.

The Proof of Work Analogy

If you've ever tried to understand Bitcoin, you know what proof of work means: to win, you don't have to be smarter than everyone else. You just have to compute harder. The difficulty scales with the total computational power on the network.

Drew Breunig coined the phrase for this new reality: cybersecurity is proof of work now.

Security is no longer (just) a battle of cleverness. It's a budget war. To harden a system, you need to spend more tokens discovering exploits than your attackers will spend exploiting them. You don't win on elegance. You win by buying more compute.

This reframes the entire threat model.

What "Cheap Code" Buys You — And What It Doesn't

The last 18 months have created a gold rush of vibe-coded apps, AI-generated backends, and "I just described it to Claude and it worked" microservices. Code is cheap now. Genuinely cheap. A weekend project that would've taken months in 2021 takes a Saturday afternoon.

But cheap code doesn't mean secure code.

Actually, it's arguably the opposite. Vibe-coded applications are often:

Missing input validation that a battle-tested developer would add reflexively
Built on dependency chains the author has never read
Deployed before anyone's done a real threat model
Maintained by someone who's more "product visionary" than "systems programmer"

We're flooding the attack surface. Every LLM-generated CRUD app, every "I used Cursor to build my SaaS in a weekend" startup — they're all targets. And the bar to attack them just dropped dramatically.

Meanwhile, Mythos and models like it are available (with restrictions) to defenders. But also: they're coming for attackers too.

The Attacker's Advantage Is Asymmetric

Here's the uncomfortable math.

A defender has to secure everything. An attacker only needs one gap. That asymmetry has always existed, but AI tilts it further.

With a token-based attack model, the attacker sets a budget based on the value of what they're trying to steal. A supply chain attack against a widely-used open source library? Worth spending more tokens, because the payoff is enormous. A small SaaS startup? Maybe a few thousand dollars of tokens.

The defender doesn't get to choose how much the attacker spends. They only get to decide how much they spend. And here's the brutal part: if Mythos keeps finding exploits regardless of how many tokens you've already spent, there's no budget that guarantees safety. You're just making yourself a less attractive target than the person next to you.

It's not about being secure. It's about being more expensive to attack than you're worth.

Consider the economics from the attacker's side. If a corporate network contains $10M worth of sensitive data, spending $50k in token costs to compromise it is a rational investment. That math holds even for moderately well-secured targets. And as inference costs continue dropping — they've fallen roughly 10x every 18 months — that budget ceiling rises. Attacks that would've cost $500k to run in 2024 cost $50k in 2026. By 2028? Maybe $5k.

This isn't speculation. It's extrapolating a curve that's already been moving for three years. The question isn't whether AI-powered attacks become economically viable at scale. They already are. The question is whether defenders can close the gap before the asymmetry becomes unmanageable.

For now, the answer is: maybe, if you start treating security as a continuous compute cost rather than a one-time audit.

The Three-Phase Development Loop

This isn't all doom. There's a practical response that's already starting to emerge among forward-thinking teams.

The old dev cycle: write code, review code, ship code.

The emerging cycle:

1. Development   → implement features, iterate fast, human-guided
2. Review        → refactor, document, apply best practices (often async, often AI-assisted)
3. Hardening     → throw token budget at your own system, find exploits before attackers do

Anthropic already sells a code review product ($15–20 per PR). The "hardening" phase is coming next. Tools that autonomously probe your codebase for security holes, running until your budget runs out.

The key insight Breunig identifies: human intuition gates Phase 1. Money gates Phase 3. That's why they're inherently separate — you don't want to burn $10k hardening a feature that's about to be redesigned anyway.

# This is the rough shape of where CI/CD pipelines are heading
git push origin feature/payment-flow

# Phase 1: Standard CI
→ tests pass, lint clean, types check

# Phase 2: AI review
→ Code review bot flags 3 issues, suggests refactor

# Phase 3: Hardening (on merge to main, or scheduled)
→ Security agent probes for 30 minutes on $200 budget
→ Report: 1 medium-severity finding (SSRF via unvalidated URL input)
→ Block merge until resolved

This isn't science fiction. It's roughly where things are headed in the next 12–18 months.

Open Source's Unexpected Second Wind

Here's a take that surprised me when I first read it: the proof-of-work model might actually be good for open source security.

Linus's law says: "given enough eyeballs, all bugs are shallow." What AI does is expand that to include tokens. If major corporations that depend on a widely-used OSS library pool resources to run security sweeps against it, that library might end up more secure than any proprietary equivalent — because the token budget is shared across the whole ecosystem, and so is the benefit.

It's the opposite of the "everyone uses it so everyone attacks it" doom spiral. Instead: everyone uses it, everyone has incentive to harden it, and the hardening cost is amortizable.

That's not guaranteed — a popular target is also more valuable to attackers, so they'll spend more too. But it's a more nuanced picture than "vibe-coded dependencies are going to kill us all."

Though also, maybe don't npm install left-pad-2026-edition without thinking about it.

The Dependency Problem Hasn't Gone Away

There's a separate thread worth pulling here. In the wake of the LiteLLM and Axios supply chain scares earlier this year, some developers — including Andrej Karpathy — started arguing for reimplementing dependencies rather than pulling them in. The logic: a custom implementation has a smaller, more auditable surface area, and you can use an LLM to write it in an afternoon anyway.

This is the "vibe-coding your way to security" argument, and it has real merit on small, bounded tasks. But it also has a serious failure mode: most developers are not security engineers. An LLM-generated implementation of JWT validation or cryptographic hashing written by someone who doesn't know what a timing attack is might be more dangerous than the well-audited library it replaced.

The proof-of-work framing actually helps here too. A widely-used open source library has a massive collective interest in its security — and increasingly, the tools to act on that interest. A one-off reimplementation has only you. And if you haven't run Mythos-scale probing against your custom HMAC implementation... well. Hope you got it right.

The pragmatic take: use established libraries for anything security-critical. Reserve the "yoink and reimplement" approach for non-security functionality where the risk profile is manageable.

What This Means For You, Practically

If you're a solo dev or small team:

Treat security as a real line item, not an afterthought. "I'll deal with it later" is a bet that attackers won't find you first.
Use AI code review tools now — they're cheap and catch obvious issues.
Prioritize reducing your attack surface: fewer dependencies, simpler architecture, less exposed surface.
Run OWASP ZAP or equivalent against anything you deploy. Free, fast, catches the low-hanging fruit.

If you're on a security team:

Start budgeting for AI-assisted red teaming. The question isn't if this becomes standard practice — it's whether you do it before your attackers do.
Push for a formal "hardening" phase in your CI/CD. Even a modest token budget run on a schedule finds real issues.
Update your threat models. "A bored teenager with a SQLmap script" is no longer the ceiling for what you're defending against.

If you're a dev lead or engineering manager:

The cost of code is no longer the dominant cost. The cost of securing code is. Budget accordingly.
"We'll do a security audit before launch" is no longer good enough for anything beyond trivial internal tooling. Continuous hardening needs to be part of the process.

The Bottom Line

Security has always been a game of asymmetric risk. AI just turbocharged the asymmetry.

The good news: the same tools that make attackers more powerful are available to defenders. The bad news: defenders have to cover the whole surface, attackers only need one crack, and the cost of "enough" security is now directly tied to the market value of what you're protecting.

The proof-of-work framing is useful precisely because it strips away the illusion that we can think our way to security. Smart architecture matters. Defense in depth matters. But ultimately, you're competing in a compute arms race, and your security budget is a bid in that market.

Cheap code got us here. Expensive security is what keeps us standing.

Sources: Cybersecurity Looks Like Proof of Work Now (Drew Breunig) · AISI Evaluation of Claude Mythos · The Last Ones benchmark