More

LiamPowell · 2026-02-14T01:34:03 1771032843

> Ars Technica wasn’t one of the ones that reached out to me, but I especially thought this piece from them was interesting (since taken down – here’s the archive link). They had some nice quotes from my blog post explaining what was going on. The problem is that these quotes were not written by me, never existed, and appear to be AI hallucinations themselves.

Once upon a time, completely falsifying a quote would be the death of a news source. This shouldn't be attributed to AI and instead should be called what it really is: A journalist actively lying about what their source says, and it should lead to no one trusting Ars Technica.

rectang · 2026-02-14T01:35:45 1771032945

When such things have happened in the past, they've led to an investigation and the appointment of a Public Editor or an Ombud. (e.g. Jayson Blair.)

I'm willing to weigh a post mortem from Ars Technica about what happened, and to see what they offer as a durable long term solution.

marscopter · 2026-02-14T01:59:44 1771034384

There is a post on their forum from what appears to Ars Technica staff saying that they're going to perform an investigation.[0]

[0] https://arstechnica.com/civis/threads/journalistic-standards...

Kye · 2026-02-14T16:57:28 1771088248

Rolling Stone, for example: https://en.wikipedia.org/wiki/A_Rape_on_Campus#Columbia_Univ...

dboreham · 2026-02-14T01:44:46 1771033486

Since we're all in a simulation, this is fine.

LiamPowell · 2026-02-11T10:25:46 1770805546

It's in fact the opposite. Browsers show a popup that asks if you really intended to click a link with a non http/https handler, notepad does not.

The actual RCE here would be in some other application that registers a URL handler. Java used to ship one that was literally designed to run arbitrary code.

fhd2 · 2026-02-11T11:42:55 1770810175

Ah, got it. Very different from where I suspected the issue then.

LiamPowell · 2026-02-11T04:33:49 1770784429

Netflix has a far smaller catalogue and can cache content in exchanges very close to the user, see [1]. Also YouTube pays their creators.

[1]: https://en.wikipedia.org/wiki/Open_Connect

danpalmer · 2026-02-11T04:38:59 1770784739

Google has its Global Cache: https://en.wikipedia.org/wiki/Google_Global_Cache

One might imagine that the cache-ability is lower than Netflix, I can't comment on this, but GGC is very significant.

LiamPowell · 2026-02-06T08:51:24 1770367884

Android does use snapshots: https://source.android.com/docs/core/ota/virtual_ab

dizhn · 2026-02-06T11:38:55 1770377935

Oh cool. I was a bit confused about not using snapshots and relying on symlinks but it couldn't be so simple. I guess it's just a simple userspace cow mount. https://source.android.com/docs/core/ota/virtual_ab#compress...

LiamPowell · 2026-02-06T04:49:03 1770353343

> Compilers will produce working output given working input literally 100% of my time in my career.

In my experience this isn't true. People just assume their code is wrong and mess with it until they inadvertently do something that works around the bug. I've personally reported 17 bugs in GCC over the last 2 years and there are currently 1241 open wrong-code bugs.

Here's an example of a simple to understand bug (not mine) in the C frontend that has existed since GCC 4.7: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105180

grey-area · 2026-02-06T07:35:30 1770363330

These are still deterministic bugs, which is the point the OP was making. They can be found and solved once. Most of those bugs are simply not that important, so they never get attention.

LLMS on the other hand are non-deterministic and unpredictable and fuzzy by design. That makes them not ideal when trying to produce output which is provably correct - sure you can output and then laboriously check the output - some people find that useful, some are yet to find it useful.

It's a little like using Bitcoin to replace currencies - sure you can do that, but it includes design flaws which make it fundamentally unsuited to doing so. 10 years ago we had rabid defenders of these currencies telling us they would soon take over the global monetary system and replace it, nowadays, not so much.

zx8080 · 2026-02-06T10:30:16 1770373816

> It's a little like using Bitcoin to replace currencies [...]

At least, Bitcoin transactions are deterministic.

Not many would want to use a AI currency (mostly works; always shows "Oh, you are 100% right" after losing one's money).

grey-area · 2026-02-06T15:17:10 1770391030

Sure bitcoin is at least deterministic, but IMO (an that of many in the finance industry) it's solving entirely the wrong problem - in practice people want trust and identity in transactions much more than they want distributed and trustless.

In a similar way LLMs seem to me to be solving the wrong problem - an elegant and interesting solution, but a solution to the wrong problem (how can I fool humans into thinking the bot is generally intelligent), rather than the right problem (how can I create a general intelligence with knowledge of the world). It's not clear to me we can jump from the first to the second.

zx8080 · 2026-02-07T04:35:33 1770438933

By eliminating the second one.

HappMacDonald · 2026-02-07T07:20:10 1770448810

Bitcoin transactions rely on mining to notarize, which is by design (due to the nature of the proof-of-work system) incredibly non-deterministic.

So when you submit a transaction, there is no hard and fast point in the future when it is "set in stone". Only a geometrically decreasing likelihood over time that a transaction might get overturned, improving by another geometric notch with every confirmed mined block that has notarized your transaction.

A lot of these design principles are compromises to help support an actually zero-trust ledger in contrast to the incumbent centralized-trust banking system, but they definitely disqualify bitcoin transactions as "deterministic" by any stretch of the imagination. They have quite a bit more in common with LLM text generation than one might have otherwise thought.

grey-area · 2026-02-11T08:28:29 1770798509

Not sure I agree, the only axis on which Bitcoin is non-deterministic is that of time - the time to confirmation is not set in stone. Outcomes are still predictable though and follow strict rules.

It’s a fundamentally different product, LLMs are fuzzy word matchers and produce different outcomes even for the same input every time, they inject variance to make them seem more human. I think we’re straying off topic here though.

throw10920 · 2026-02-06T14:22:28 1770387748

> I've personally reported 17 bugs in GCC over the last 2 years

You are an extreme outlier. I know about two dozen people who work with C(++) and not a single one of them has ever told me that they've found a compiler bug when we've talked about coding and debugging - it's been exclusively them describing PEBCAK.

usefulcat · 2026-02-06T16:23:32 1770395012

I've been using c++ for over 30 years. 20-30 years ago I was mostly using MSVC (including version 6), and it absolutely had bugs, sometimes in handling the language spec correctly and sometimes regarding code generation.

Today, I use gcc and clang. I would say that compiler bugs are not common in released versions of those (i.e. not alpha or beta), but they do still occur. Although I will say I don't recall the last time I came across a code generation bug.

arvyy · 2026-02-06T15:06:45 1770390405

I knew one person reporting gcc bugs, and iirc those were all niche scenarios where it generated slightly suboptimal machine code but not otherwise observable from behavior

throw10920 · 2026-02-06T15:10:33 1770390633

Right - I'm not saying that it doesn't happen, but that it's highly unusual for the majority of C(++) developers, and that some bugs are "just" suboptimal code generation (as opposed to functional correctness, which the GP was arguing).

rhubarbtree · 2026-02-06T07:43:42 1770363822

This argument is disingenuous and distracts rather than addresses the point.

Yes, it is possible for a compiler to have a bug. No, that is I’m mo way analogous to AI producing buggy code.

I’ve experienced maybe two compiler bugs in my twenty year career. I have experienced countless AI mistakes - hundreds? Thousands? Already.

These are not the same and it has the whiff of sales patter trying to address objections. Please stop.

LiamPowell · 2026-02-06T10:31:29 1770373889

I'm not arguing that LLMs are at a point today where we can blindly trust their outputs in most applications, I just don't think that 100% correct output is necessarily a requirement for that. What it needs to be is correct often enough that the cost of reviewing the output far outweighs the average cost of any errors in the output, just like with a compiler.

This even applies to human written code and human mistakes, as the expected cost of errors goes up we spend more time on having multiple people review the code and we worry more about carefully designing tests.

sampullman · 2026-02-06T11:41:43 1770378103

If natural language is used to specify work to the LLM, how can the output ever be trusted? You'll always need to make sure the program does what you want, rather than what you said.

ineedasername · 2026-02-06T18:26:57 1770402417

>"You'll always need to make sure the program does what you want, rather than what you said."

Yes, making sure the program does what you want. Which is already part of the existing software development life cycle. Just as using natural language to specify work already is: It's where things start and return to over and over throughout any project. Further: LLM's frequently understand what I want better than other developers. Sure, lots of times they don't. But they're a lot better at it than they were 6 months ago, and a year ago they barely did so at all save for scripts of a few dozen lines.

sampullman · 2026-02-07T09:16:12 1770455772

That's exactly my point, it's a nice tool in the toolbox, but for most tasks it's not fire-and-forget. You still have to do all the same verification you'd need to do with human written code.

rootnod3 · 2026-02-06T13:17:06 1770383826

Just create a very specific and very detailed prompt that is so specific that it starts including instructions and you came up with the most expensive programming language.

reverius42 · 2026-02-06T15:52:11 1770393131

It's not great that it's the most expensive (by far), but it's also by far the most expressive programming language.

rootnod3 · 2026-02-06T18:38:25 1770403105

How is it more expressive? What is more expressive than Turing completeness?

reverius42 · 2026-02-09T08:19:56 1770625196

This is a non-sequitur. Almost all programming languages are Turing complete, but I think we'd all agree they vary in expressivity (e.g. x64 assembly vs. TypeScript).

By expressivity I mean that you can say what you mean, and the more expressive the language is, the easier that is to do.

It turns out saying what you mean is quite easy in plain English! The hard part is that English allows a lot of ambiguity. So the tradeoffs of how you express things are very different.

I also want to note how remarkable it is that humans have built a machine that can effectively understand natural language.

signatoremo · 2026-02-06T14:43:39 1770389019

You trust your natural language instructions thousand times a day. If you ask for a large black coffee, you can trust that is more or less what you’ll get. Occasionally you may get something so atrocious that you don’t dare to drink, but generally speaking you trust the coffee shop knows what you want. It you insist on a specific amount of coffee brewed at a specific temperature, however, you need tools to measure.

AI tools are similar. You can trust them because they are good enough, and you need a way (testing) to make sure what is produced meet your specific requirements. Of course they may fail for you, doesn’t mean they aren’t useful in other cases.

All of that is simply common sense.

rhubarbtree · 2026-02-07T11:45:02 1770464702

More analogy.

What’s to stop the barista putting sulphuric acid in your coffee? Well, mainly they don’t because they need a job and don’t want to go to prison. AIs don’t go to prison, so you’re hoping they won’t do it because you’ve promoted them well enough.

rhubarbtree · 2026-02-07T15:04:41 1770476681

* prompted

oblio · 2026-02-06T18:21:45 1770402105

> All of that is simply common sense.

Is that why we have legal codes spanning millions of pages?

sampullman · 2026-02-07T09:18:49 1770455929

The person I'm replying to believes that there will be a point when you no longer need to test (or review) the output of LLMs, similar to how you don't think about the generated asm/bytecode/etc of a compiler.

That's what I disagree with - everything you said is obviously true, but I don't see how it's related to the discussion.

LiamPowell · 2026-02-07T11:42:43 1770464563

I don't necessarily think we'll ever reach that point and I'm pretty sure we'll never reach that point for some higher risk applications due to natural language being ambiguous.

There are however some applications where ambiguity is fine. For example, I might have a recipe website where I tell a LLM to "add a slider for the user to scale the number of servings". There's a ton of ambiguity there but if you don't care about the exact details then I can see a future where LLMs do something reasonable 99.9999% of the time and no one does more than glance at it and say it looks fine.

How long it is until we reach that point and if we'll ever reach that point is of course still up for debate, but I dnt think it's completely unrealistic.

sampullman · 2026-02-07T15:06:06 1770476766

That's true, and I more or less already use it that way for things like one off scripts, mock APIs, etc.

wtetzner · 2026-02-06T18:36:27 1770402987

I don't think the argument is that AI isn't useful. I think the argument is that it is qualitatively different from a compiler.

yourapostasy · 2026-02-06T11:39:14 1770377954

The challenge not addressed with this line of reasoning is the required sheer scale of output validation on the backend of LLM-generated code. Human hand-developed code was no great shakes at the validation front either, but the scale difference hid this problem.

I’m hopeful what used to be tedious about the software development process (like correctness proving or documentation) becomes tractable enough with LLM’s to make the scale more manageable for us. That’s exciting to contemplate; think of the complexity categories we can feasibly challenge now!

dbtablesorrows · 2026-02-06T12:54:35 1770382475

the fact that the bug tracker exists is proving GP's point.

eklavya · 2026-02-06T12:04:31 1770379471

Right, now what would you say is the probability of getting a bug in compiler output vs ai output?

It's a great tool, once it matures.

LiamPowell · 2026-02-06T00:36:03 1770338163

Unspecified behaviour is defined in the glossary at the start of the spec and the term "unspecified" appears over a hundred times...

LiamPowell · 2026-01-29T07:36:49 1769672209

Adblock continues to be just as effective as it ever was in Chrome.

Even before the removal of MV2, the claims that it would kill adblock were ridiculous as many adblockers had already switched to MV3 but it was at least understandable that people could be ignorant of that fact. Now that everything is on MV3 how can people still be claiming that Google killed adblock when Chrome users still have working adblockers?

LiamPowell · 2026-01-25T05:07:38 1769317658

You don't actually need your own driver, you can just use the CDC device class.

nanolith · 2026-01-25T05:17:56 1769318276

That's true. The only advantage of writing a driver in this case is if I wanted to add functions, such as a programmable level shifter.

LiamPowell · 2026-01-22T01:45:18 1769046318

You don't necessarily need any sort of electronic counting for quick results. Federal elections in Australia are usually called late on the voting day and I imagine the same is true for other countries that are paper-only.

lawtalkinghuman · 2026-01-22T08:07:03 1769069223

Same in the UK.

Votes close at 10pm. Might be a few stragglers left in the queue, so call it 10:15pm. (Exit poll results are embargoed until 10pm.)

Ballot boxes are transferred from individual polling station to the location of the count. The postal votes have been pre-checked (but the actual ballot envelope has not been opened or counted) and are there to be counted alongside the ballots from the polling stations.

Then a small army of vote counters go through the ballots and count them and stack together ballots by vote. There are observers - both independent and appointed by the candidates. The returning officer counts the batches up, adjudicates any unclear or challenged ballot, then declares the result.

The early results come out usually about 1 or 2. The bulk of the results come out about 4 or 5. Some constituencies might take a bit longer - it's a lot less effort to get ballot boxes a mile or two down the road in a city centre constituency than getting them from Scottish islands etc. - but it'll be clear who has the majority by 6 or 7 the next day.

I can appreciate that the US is significantly larger than the UK, but pencil-and-paper voting with prompt manual counts is eminently possible.

anon291 · 2026-01-22T05:25:43 1769059543

Oh but you see in America, it takes us more than three weeks to count ballots.

LiamPowell · 2026-01-19T07:05:16 1768806316

The +12 is to keep the number positive. The direction contains the movement so a +1 wouldn't make sense.