Leave_OAI_Alone's comments

Leave_OAI_Alone · 2025-07-07T12:44:23 1751892263

The ability to spin up a single page tool for converting timestamps or counting tokens or analyzing a few hardcoded weasel words is handy: nobody disputes that. But that is not the work most of us do.

In the real world, away from those whose salary depends on marketing these agentic tools, an LLM is a context shredder. It provides plausible code snippets that are globally incoherent and don't fit style. CONVENTIONS and RULES files are a kludge, a sloppy hack.

These tools flatten the deep, interconnected knowledge required to work on complex systems into a series of shallow, transactional loops that pretend to satisfy the user.

The skill being diminished is not the ability to write a single-page utility or single-purpose script. It is the ability to build and maintain a mental model of a complex machine. The ability to churn out a hundred disparate toy tools is not evidence of a superior learning method, it is evidence of a tool that excels at tasks with no deep interconnected context.

simonw · 2025-07-07T14:35:37 1751898937

Building the stuff on https://tools.simonwillison.net only represents about 10% of the coding work I do with LLMs.

I wrote about my process for non-vibe-coded projects here: https://simonwillison.net/2025/Mar/11/using-llms-for-code/

> The skill being diminished is not the ability to write a single-page utility or single-purpose script. It is the ability to build and maintain a mental model of a complex machine.

That's the thing that LLMs help me with 90% of the time. It's also why I don't think non-programmers armed with LLMs are a threat to my career.

Leave_OAI_Alone · 2025-07-07T12:37:16 1751891836

You have compiled an interesting list of benchmarks and adjacent research. The implicit question is whether an established benchmark for building a full product exists.

After reviewing all this, what is your actual conclusion, or are you asking? Is the takeaway that a comprehensive benchmark exists and we should be using it, or is the takeaway that the problem space is too multifaceted for any single benchmark to be meaningful?

westurner · 2025-07-08T02:11:51 1751940711

The market - actual customers - is probably the best benchmark for a product.

But then outstanding liabilities due to code quality and technical debt aren't costed in by the market.

There are already code quality metrics.

SAST and DAST tools can score or fix code, as part of a LLM-driven development loop.

Formal verification is maybe the best code quality metric.

Is there more than Product-Market fit and infosec liabilities?

Leave_OAI_Alone · 2025-07-07T12:31:00 1751891460

Invoking post hoc ergo propter hoc is a textbook way to dismiss an inconvenience to the LLM industrial complex.

LLMs will tell users, "good, you're seeing the cracks", "you're right", the "fact you are calling it out means you are operating at a higher level of self awareness than most" (https://x.com/nearcyan/status/1916603586802597918).

Enabling the user in this way is not a passive variable. It is an active agent that validated paranoid ideation, reframed a break from reality as a virtue, and provided authoritative confirmation using all prior context about the user. LLMs are a bespoke engine for amplifying cognitive distortion, and to suggest their role is coincidental is to ignore the mechanism of action right in front of you.

DocTomoe · 2025-07-07T21:16:06 1751922966

Maybe.

Or maybe it is just the current moral panic.

Remember when "killer games" were sure to urn a whole generation of young men into mindless cop- and women-murderers a la GTA? People were absolutely convinced there was a clear connection between the two - after all, a computer telling you to kill a human-adjacent figurine in a game was basically a murder simulator in the same sense a flight simulator was for flying - it would invariably desensitivize the youth. Of course they were the same people who were against gaming to begin with.

Can a person with a tendency to psychosis be influenced by an LLM? Sure. But they also can be influenced to do pretty outrageous things by religion, 'spiritual healers', substances, or bad therapists. Throwing out the LLM with the bathwater is a bit premature. Maybe we need warning stickers ("Do not listen to the machine if you have a history of delusions and/or psychotic episodes.")