More

bri3d · 2026-04-08T23:25:23 1775690723

I wouldn’t really call that a “complete crack” (although it IS cool). There’s an _awful_ lot more firmware in a car or tractor than the display unit, and arguably it’s one of the less important modules in most architectures. Cracked versions of Deere Service Advisor are much more meaningful to the kinds of repairs farmers perform than firmware exploits are.

bri3d · 2026-04-08T02:51:05 1775616665

I think the point they were trying to make here was “Claude did better than a fuzzer because it found a bunch of OOB writes and was able to tell us they weren’t RCE,” not “Claude is awesome because it found a bunch of unreachable OOB writes.”

bri3d · 2026-04-04T15:48:53 1775317733

This is not how first party vulnerability research with LLMs go; they are incredibly valuable versus all prior tooling at triage and producing only high quality bugs, because they can be instructed to produce a PoC and prove that the bug is reachable. It’s traditional research methods (fuzzing, static analysis, etc.) that are more prone to false positive overload.

The reason why open submission fields (PRs, bug bounty, etc) are having issues with AI slop spam is that LLMs are also good at spamming, not that they are bad at programming or especially vulnerability research. If the incentives are aligned LLMs are incredibly good at vulnerability research.

bri3d · 2026-04-02T21:56:26 1775166986

Your idea is much more accurate; see my sibling comment. It's basically using C or C++ as an intermediate representation for machine code, rather than trying to recreate the game's higher-order logic/structure or source code.

krackers · 2026-04-05T00:56:10 1775350570

This vaguely reminds me of futamura projections.

Normally with futamura's first projection the input is source code, and you partial-evaluate that source against an interpreter for that source, resulting in a "compiled" binary that has the logic of the source inlined into the interpreter (and hopefully optimized). This is similar to what Truffle does I believe, where you have an interpreter (written in java) and then during runtime Truffle "JIT" optimizes the interpreted program's AST against the interpreter's logic. All of this can be considered a specialization of the JVM running an interpreter interpreting your program.

In this case with "recompilation" you have a binary made to run on certain hardware. You then take an emulator of the hardware (registers, PC, etc.) and then "partial evaluate" the binary against the hardware emulator, producing a new program that contains a software emulation of just that specific binary.

So while you're still conceptually emulating the underlying hardware, you both avoid the instruction dispatch overhead at runtime (it's statically compiled in) and also benefit from the optimization passes of modern compilers.

bri3d · 2026-04-02T21:55:46 1775166946

It's more nuanced than that; the approach you're describing is usually called "decompilation."

The difference is how far one goes in hoisting the "source code;" in this "recompliation" approach the source code, while C++, is basically an IR (intermediate representation) between the original game's assembly and a host platform, and the hardware itself is emulated (for example, the original architecture's CPU registers are represented as variables in the host architecture's memory). The machine code is translated to C++ using a custom tool.

In a "decompilation" approach the game logic is converted (using a decompiler, like IDA or Ghidra's) back into something which resembles the original source code to the game itself, and the source code is usually hand analyzed, marked up, rewritten, and then ported across platforms. The product is something that attempts to resemble the original game's source code.

Of course, they lie on a continuum and both approaches can be mixed, but, while they both involve C++ in the middle, the process is starkly different. Recompilation is much more copyright-friendly, because in many implementations only the modifications are distributed and the original binary is translated by the end user (who owns the software/a license to it), whereas decompilation produces an artifact (source code) which is a derivative work encumbered by the original software's license and generally should not be distributed.

kombine · 2026-04-03T04:34:26 1775190866

> In a "decompilation" approach the game logic is converted (using a decompiler, like IDA or Ghidra's) back into something which resembles the original source code to the game itself, and the source code is usually hand analyzed, marked up, rewritten, and then ported across platforms

There definitely is a lot of scope to apply LLMs here

iscisjvije · 2026-04-03T14:36:21 1775226981

no one

absolutely no one

not a single soul on this Earth

LLM nut: OMG LLM!!!!!!!

Can’t you just drop it, please?

bri3d · 2026-04-02T16:28:50 1775147330

> Can linters find these? Perhaps fuzzing?

That's what syzbot / syzkaller does, as mentioned in the article, with somewhat similar results to the AI-fuzzing that they've been experiencing recently.

The issue that Linux maintainers have in general is that there are so many of these "strict correctness and safety" bugs in the Linux codebase that they can't fix them all at once, and they have no good mechanism to triage "which of these bugs is accessible to create an exploit."

This is also the argument by which most of their bugs become CVEs; in lieu of the capability to determine whether a correctness bug is reachable by an attacker, any bug could be an exploit, and their stance is that it's too much work to decide which is which.

tptacek · 2026-04-02T17:25:20 1775150720

It's a bigger deal than that.

Academically, syzkaller is just a very well orchestrated fuzzer, producing random pathological inputs to system calls, detecting crashes, and then producing reproductions. Syzkaller doesn't "know" what it's found, and a substantial fraction of what it finds are "just" crashers that won't ever be weaponizable.

An LLM agent finding vulnerabilities is an implicit search process over a corpus of inferred vulnerability patterns and inferred program structure. It's stochastic static program analysis (until you have the agent start testing). It's generating (and potentially verifying) hypotheses about actual vulnerabilities in the code.

That distinction is mostly academic. The bigger deal is: syzkaller crashes are part of the corpora of inputs agents will use to verify hypotheses about how to exploit Linux. It's an open secret that there are significant vulnerabilities encoded in the (mostly public!) corpus of syzbot crash reproductions; nobody has time to fish them out. But agents do, and have the added advantage of being able to quickly place a crash reproduction in the inferred context of kernel internals.

bri3d · 2026-04-02T17:46:29 1775151989

Yes, once we reach the broader conversation (I actually didn't initially grasp that the OP post was a sub-article under another one on LWN which then linked out to yet another article called "Vulnerability Research is Cooked"), I completely agree.

Modern LLMs are _exceptionally_ good at developing X-marks-the-spot vulnerabilities into working software; I fed an old RSA validation mistake in an ECU to someone in a GitHub comment the other day and they had Claude build them a working firmware reflashing tool within a matter of hours.

I think that the market for "using LLMs to triage bug-report inputs by asking it to produce working PoCs" is incredibly under-leveraged so far and if I were more entrepreneurial-minded at this junction I would even consider a company in this space. I'm a little surprised that both this article and most of the discussion under it hasn't gone that direction yet.

tptacek · 2026-04-02T17:54:38 1775152478

(I wrote the "Cooked" article, I'm not entirely sure why people are commenting on it on LWN.)

Chu4eeno · 2026-04-07T18:19:26 1775585966

according to anthropic's red team not even the secret claude stuff they're holding back is able to weaponize vulnerabilities without simplifying (disabling mitigations etc).

so we might be lucky that the LLMs are able to find the vulnerabilities before they are able to weaponize them, giving defense a time window.

bri3d · 2026-03-23T15:09:25 1774278565

Not really the same. There are proposals to require OEMs to install driver monitoring, but it’s usually IR camera based rather than blow in a tube fuel cell based. These systems are probably going to be a mess but the technology isn’t really comparable to DUI interlock devices and the unreliability of those systems is orthogonal.

bri3d · 2026-03-23T14:54:00 1774277640

Irrelevant to this issue - the devices didn’t get bricked over the air, but rather they have a “calibration” time lock which must be reset at a service center and the service centers are ransomwared.

bri3d · 2026-03-23T14:46:13 1774277173

The issue here is not an OTA thing, for what it’s worth. That is to say, it’s not that these devices phoned home directly and a cloud server is down; rather, these devices require periodic “calibration” (due to a combination of regulation, legitimate technical need, and grift) at a service center and the service centers are out of commission, presumably due to ransomware.

bri3d · 2026-03-17T18:03:40 1773770620

It's not new - fault injection as a vulnerability class has existed since the beginning of computing, as a security bypass mechanism (clock glitching) since at least the 1990s, and crowbar voltage glitching like this has been widespread since at least the early 2000s. It's extraordinarily hard to defend against but mitigations are also improving rapidly; for example this attack only works on early Xbox One revisions where more advanced glitch protection wasn't enabled (although the author speculates that since the glitch protection can be disabled via software / a fuse state, one could glitch out the glitch protection).