← Back to Blog

Security

The Security Debt Nobody's Pricing In

AI writes code faster than anyone can review it - and nearly half of that code ships with security flaws. Here's the bill quietly coming due.

The Security Debt Nobody's Pricing In

Every team I talk to is shipping more code than they were a year ago. Almost none of them are reviewing more code than they were a year ago. Sit with that gap for a second, because that gap is the whole problem.

We've spent two years celebrating how much faster AI lets us write software. We have spent almost no time reckoning with the fact that writing was never the bottleneck. Reviewing was. Securing was. Understanding was. AI floored the accelerator on the one part of the process that was already fast, and the bill for that is accumulating in a place nobody's looking at: the security of the code itself.

The number you should not ignore

Veracode ran the test everyone should have run before betting their roadmap on this. They took over 100 large language models and had them generate code across Java, JavaScript, Python, and C#, then ran the output through security analysis. The 2025 GenAI Code Security Report found that 45% of AI-generated code introduced a known security vulnerability.

Not "could theoretically be insecure." Failed a real security check. Nearly half the time.

The detail underneath the headline is worse. Java came in at a 72% failure rate. Cross-site scripting failed 86% of the time; log injection, 87%. And the models have not been quietly fixing this in the background - Veracode's Spring 2026 update found that even as syntactic correctness climbed past 95%, the security pass rate sat flat around 55%. Two years of "revolutionary" model releases moved the security needle from roughly 55% to roughly 55%. The models got dramatically better at writing code that runs and barely moved on writing code that's safe. A separate CodeRabbit analysis of 470 real-world pull requests put AI-co-authored code at roughly 2.74 times the security-vulnerability rate of human-only code.

Read those two facts together: 95% correct, 55% secure. The code works. It demos. It passes the happy-path test. And it carries a vulnerability into your codebase almost half the time, wearing the disguise of working software.

Why this happens (and why it won't fix itself)

This isn't the models being dumb. It's the models doing exactly what they were built to do.

A language model is trained to produce the most probable code given the prompt - code that looks like the vast corpus of public code it learned from. And public code is, on average, insecure. It's full of tutorials that skip input validation "for clarity," Stack Overflow answers optimized for getting the thing working, and example snippets that were never meant to touch production. The model faithfully reproduces the average. The average has a 45% problem.

More fundamentally: the model has no threat model. When you ask for a function that handles user uploads, it's optimizing for "does this handle uploads," not "what happens when someone uploads a 2GB file named ../../etc/passwd." Security is the set of things that happen on the paths you didn't ask about. The model only writes the path you asked about. Absent explicit security instructions, safety is simply not in the objective.

And it does all of this with total fluency. There's no hesitation, no "I'm not sure this is safe," no junior-developer tell. It hands you an SQL query built with string concatenation in the same confident tone it uses for everything. The vulnerability doesn't look like a vulnerability. It looks like finished work.

The review bottleneck is where the debt lives

Here's how the debt actually compounds.

Code generation got something like ten times faster. Code review did not. A human still has to read the diff, understand the intent, trace the edge cases, and spot the missing validation - and humans read at human speed. So one of two things happens, and both are bad.

Either review becomes the bottleneck and all that vaunted velocity evaporates in a growing PR queue - or, far more common, review quietly degrades. Reviewers start skimming AI diffs the way you skim a contract you've decided to sign anyway. The volume is too high, the code looks plausible, and "looks plausible" is exactly the failure mode, because plausible-but-wrong is the house specialty.

The developer survey data backs this up from the inside: 66% of developers say their biggest AI frustration is output that's almost right but not quite, and 45% say debugging AI-generated code takes more time than writing it themselves. Now apply that to security review, where the bug doesn't announce itself by crashing - it just sits there, working perfectly, until someone finds it. The 2025 DORA report's finding lands here too: AI adoption raised delivery throughput while lowering delivery stability. More out the door, less of it sound.

Security debt behaves like technical debt with one cruel difference. Tech debt slows you down. Security debt gets exploited. It doesn't sit politely on a backlog waiting for a refactor - it waits for someone who's looking for it.

Building the safety net

None of this is an argument against using AI to write code. I use it daily and I'm not stopping. It's an argument that the velocity is only a win if you pair it with a security process that can keep up. The good news: the controls are mostly things mature teams already know how to do. They just stopped being optional.

  • Treat AI as an untrusted contributor, not an oracle. It's a fast, tireless junior who's read everything and understood the security implications of none of it. You would not merge a stranger's PR unreviewed. Same rule. No exceptions for the robot.
  • Make security part of the prompt. The single cheapest improvement available. "Parameterize all queries, validate and sanitize every input, never build SQL or shell commands by string concatenation, encode all output." When you ask for secure code explicitly, you get meaningfully more of it. The model will follow a threat model - it just won't supply one.
  • Automate the floor with SAST in CI. Static analysis on every pull request is no longer a nice-to-have; it's the only thing that scales at the speed code is now being produced. Let the tools catch the mechanical 80% - the injection flaws, the hardcoded secrets, the known-bad patterns - so human reviewers can spend their finite attention on the logic and the architecture.
  • Gate the consequential paths with humans. Auth, crypto, payment handling, anything touching personal data or raising privileges - this code gets read by a person who understands the threat model, every time, regardless of who or what wrote it.
  • Measure review capacity against generation volume. If you've doubled output and held review flat, you haven't doubled productivity. You've doubled your unreviewed surface area. Make that trade visible before it makes itself visible for you.

The bottom line

AI didn't create insecure code - we were perfectly capable of that on our own. What it changed is the rate and the disguise. It produces vulnerabilities faster than ever, dressed as clean, confident, working software, and it pours them into pipelines whose review capacity hasn't moved.

The teams that come out of this fine won't be the ones who avoided AI, and they won't be the ones who embraced it uncritically. They'll be the ones who understood that "faster to write" and "ready to ship" are different claims - and who invested in the second one as seriously as the tools invested in the first. The velocity is real. So is the bill. You just get to choose whether you pay it on your schedule, in review, or on an attacker's schedule, in production.

Sources


Shipping AI-assisted code and not sure what's slipping through? Ironwright helps teams put a security process in place that keeps pace with how fast they're now writing - automated where it should be, human where it has to be.

← Back to Blog