It's always come in handy for containers/VMs (and I assume compiling Rust, as it uses as much of every other resource as it can get it's hands on) but yeah, being able to run actually useful local LLMs on my now >4 year old machine has been fantastic.
You also have the directive in go.mod which sets this for the entire module, which is very similar to the Rust edition approach, but each Go version is a small "edition".
This is the only way, because anything less would create a loophole where any abuse or slander can be blamed on an agent, without being able to conclusively prove that it was actually written by an agent. (Its operator has access to the same account keys, etc)
Once you have those real world integrations, no spec is ever complete enough. You can have a whole industry agree on decades-old RFCs and still need "but actually" details diffused through every implementation that actually gets used. There is always more detail in implementations than can be specified by any less than the code itself, and it's multiplied by the set of all possibly deployed versions, and again by possible configurations.
Fun fact: The most costly incident I ever "caused" was because I fixed a bug in an API to make it match the spec. The affected client escalated and forced me to restore the defect, even though it also affects many other clients.
Working in almost any mature space requires you to be extremely conservative in ways that have always proven difficult to specify.
This sounds like all of the things you should be doing anyway in a team environment, only now you can't trust your own judgment of where to spend the effort or knowingly leave gaps, because agents with no judgment of their own will be the ones to encounter the consequences.
Somehow I doubt the people who don't even read the code their own agent creates were saving that time to instead read the code of countless dependencies across all future updates.
It really bothers me how many comments on this topic (here and elsewhere) draw a false parallel between LLM-based coding as an abstraction and frameworks and compilers as an abstraction. They're not the same thing and it matters.
Frameworks and compilers are designed to be leak-proof abstractions. Any way in which they deviate from their abstract promise is a bug that can be found, filed, and permanently fixed. You get to spend your time and energy reasoning in terms of the abstraction because you can trust that the finished product works exactly the way you reasoned about at the abstract level.
LLMs cannot offer that promise by design, so it remains your job to find and fix any deviations from the abstraction you intended. If you fell short of finding and fixing any of those bugs, you've just left yourself a potential crisis down the line.
[Aside: I get why that's acceptable in many domains, and I hope in return people can get why it's not acceptable in many other domains]
All of our decades of progress in programming languages, frameworks, libraries, etc. has been in trying to build up leak-proof abstractions so that programmer intent can be focused only on the unique and interesting parts of a problem, with the other details getting the best available (or at least most widely applicable) implementation. In many ways we've succeeded, even though in many ways it looks like progress has stalled. LLMs have not solved this, they've just given up on the leak-proof part of the problem, trading it for exactly the costs and risks the industry was trying to avoid by solving it properly.
Your comment gets to the crux of my thinking about LLM coding. The way I think of what LLM coding is doing is decompressing your prompt into code based on the statistical likelihood of that decompression based on training data. Basically "Build me an IOS app" -> a concrete implementation of an iOS app. The issue here is that the user supplying the prompt needs to encode all of the potential variables that the AI needs to work with into the prompt, or else the implementation will just be based on the "bog-standard" iOS app based on the training corpus, although with potential differences in the app based on other tokens in the prompt. Is natural language the right way to encode that information? Do we want to rely on input tokens to the context of a model successfully making it into the output to guarantee accuracy? I think the Kiro Spec driven development starts to get at addressing the inherent issues in LLM based coding assistance, but it is an early step.
Another way in which their different is that, because they are non-deterministic, we check in the output, not the input. It's the equivalent of checking machine code into our source control instead of the input language. It's not abstraction, its non-deterministic code generation.
> LLMs cannot offer that promise by design, so it remains your job to find and fix any deviations from the abstraction you intended.
LLMs are clumsy interns now, very leaky. But we know human experts can be leak-proof. Why can't LLMs get there, too, better at coding, understanding your intentions, reviewing automatically for deviations, etc.?
Thought experiment: could you work well with a team of human experts just below your level? Then you should be able to work well with future LLMs.
Well, sometimes your compiler will work out how to more efficiently compile a thing (e.g. vectorize a loop), and other times you'll rework some code to an equivalent formulation and suddenly it won't get vectorized because you've tripped some invisible flag that prevents an inlining operation that was crucial to enabling that vectorization, and now that hot path runs at a quarter of the speed it did before.
Technically it's (usually) deterministic for a given input, and you can follow various best practices to increase your odds of triggering the right optimizations.
But practically speaking "will I get the good & fast code for this formulation" is a crap shoot, and something like 99% (99.99%?) of programmers live with that. (you have guidelines and best practices you can follow, kinda like steering agents, but rarely get guarantees)
Maybe in the future the vast majority of programmers put up with a non-deterministic & variable amount of correctness in their programs, similar to how most of us put up with a (in practice) non-deterministic & variable amount of performance in our programs now?
reply