what's actually going on with AI code right now, and what's working

quick note, no vendors in here, that wasn't the point. all the numbers are public research, I just pulled them together.

this one keeps coming up, so I wrote down what's actually going on and what's working. short version, it's two problems not one. review can't keep up with what gets shipped now, and the tooling most teams already run was never built to read AI-generated code. I spent the time on the second half, because honestly that's the part most teams haven't really mapped yet.

the first half everyone gets. more code than review can handle, so you throw more reviewers and stricter gates at it. fine.

the second half is the one that bites quietly. the AppSec stack you already run was never built to read AI-generated code, and it shows up 3 ways. none of them get fixed by adding reviewers.

1. it looks right so it passes. Stanford ran a study, devs using AI wrote less secure code on 4 of 5 tasks, and felt MORE confident about it. Veracode tested across a bunch of models, 45% of the AI code failed a security test, 86% on XSS specifically. the code reads clean, compiles, demos fine. the bug is sitting where a reviewer skimming for logic and readability just doesn't look.

2. the volume breaks the math. Apiiro's 2025 data, AI-assisted devs ship 3-4x more code and generate ~10x more security findings. so review headcount stays flat while the surface you're responsible for multiplies. that's not something you hire your way out of fast enough.

3. secrets leak more. GitGuardian looked at ~20k Copilot repos, 6.4% leaked a secret vs 4.6% baseline, so about 40% higher. makes sense honestly, training data is full of tutorials with keys hardcoded in, so that's what the model hands back.


the new one nobody had before: phantom dependencies

this is the one I'd flag hardest, because it straight up didn't exist for human-written code.

models hallucinate package names. like, plausible-sounding libraries that just don't exist. one study across 16 models and half a million prompts, 19.7% of suggested packages were fake, 205k+ unique made-up names. the part that makes it dangerous: 43% of the hallucinated ones came back on every single run for the same prompt. so they're predictable. attacker registers the fake name, drops malware in it, waits. someone tested this with a made-up "huggingface-cli", 30k+ installs in 3 months.

so a dev lets the assistant auto-install, or copies the snippet without checking, and they just pulled the attacker's code. nothing got hacked. the supply chain just has imaginary entries now, and some of them are loaded.


what's actually working

the teams ahead of this aren't doing anything fancy. it's like 4 moves that keep coming up.

  1. scan at the diff, every PR, not a quarterly SAST run. a scan that dumps 500 alerts nobody reads is theatre. catching a known pattern (SQLi from string concat, hardcoded key, SSL verify turned off, empty catch block) the moment it lands costs $200-800 in dev time. same bug in prod is $3-10k+. a breach averages $4.44M (IBM 2025). the math isn't close.
  2. catch secrets at commit, not at the audit. given the leak rate above, scanning commits and PRs for creds before they land is just the cheapest high-yield thing you can turn on.
  3. block unverified packages by default. allowlist plus lockfile/hash verification. a hallucinated package can't auto-install if the default is no. this is the specific fix for the phantom-dependency thing, and most teams just don't have it on.
  4. give AI PRs their own review lane. tag the AI-written ones, route them to a heavier review, deterministic rules for the known patterns plus a security-specific pass for the logic stuff (auth bypass, SSRF). beats treating them like any other commit.

one thing that doesn't work on its own: prompting. Kaspersky tried security-focused prompts and STILL got 38 vulns out, 7 of them critical. you can't prompt your way to safe code, the control has to live in the pipeline, not the prompt.


the rest of what's coming up

quick, in case any of these is live for you too:

the boring stuff still eats the week. patch latency, privileged access. not the AI headlines.

CISO, European bank

"AI detection is on an exponent, patch and deployment are at a lower velocity."

security architect, big platform vendor

SOC gets measured on alert volume, which kind of hides whether any of it is actually working.

project lead, big healthcare distributor


anyway

the through-line is just, AI moved the speed and volume of code, and the tools and metrics and review built for the old pace didn't catch up. the teams worth watching picked the one failure mode that actually bites them and put a control in the pipeline for it, instead of buying the generic "AI security" thing.


how this works, since you're probably wondering

I run a small research group that keeps a running read on the security vendor space, which tools are actually worth your time in each category and which are noise. that's where these numbers come from.

so if one of these is your live problem (the AI-code one or any of them), the part I can actually help with: I map your specific situation to the 2-3 vendors that genuinely fit, tradeoffs and all, and set up a short working session if it's useful. free to you. the vendors cover the cost, that's the whole model, no catch and nothing to buy from me. I only win when you find one that's actually useful.

private, vendor-neutral note shared with people who took part in the 2026 mapping. numbers cited from public research (Veracode, Stanford, Apiiro, GitGuardian, IBM, and the package-hallucination studies). not indexed, not for redistribution.