This is exactly what the article is about. The tradeoff is that you have to throughly review the plans and iterate on them, which is tiring. But the LLM will write good code faster than you, if you tell it what good code is.
Exactly; the original commenter seems determined to write-off AI as "just not as good as me".
The original article is, to me, seemingly not that novel. Not because it's a trite example, but because I've begun to experience massive gains from following the same basic premise as the article. And I can't believe there's others who aren't using like this.
I iterate the plan until it's seemingly deterministic, then I strip the plan of implementation, and re-write it following a TDD approach. Then I read all specs, and generate all the code to red->green the tests.
If this commenter is too good for that, then it's that attitude that'll keep him stuck. I already feel like my projects backlog is achievable, this year.
Strongly agree about the deterministic part. Even more important than a good design, the plan must not show any doubt, whether it's in the form of open questions or weasel words. 95% of the time those vague words mean I didn't think something through, and it will do something hideous in order to make the plan work
My experience has so far been similar to the root commenter - at the stage where you need to have a long cycle with planning it's just slower than doing the writing + theory building on my own.
It's an okay mental energy saver for simpler things, but for me the self review in an actual production code context is much more draining than writing is.
I guess we're seeing the split of people for whom reviewing is easy and writing is difficult and vice versa.
All the separate pieces seem to be working in fairly mundane and intended ways, but out in the wild they came together in unexpected ways. Which shouldn't be surprising if you have a million of these things out there. There are going to be more incidents for sure.
Theoretically we could even still try banning AI agents; but realistically I don't think we can put that genie back into the bottle.
Nor can we legislate strict 1:1 liability. The situation is already more complicated than that.
Like with cars, I think we're going to need to come up with lessons learned, best practices, then safety regulations, and ultimately probably laws.
At the rate this is going... likely by this summer.
I'm updating my thinking. Where do we put the threshold for malice, and for negligence?
Because right now, a one in a million chance of things going wrong (this month) leads to a prediction of 2-3 incidents already. (anecdata across the HN discussions we've had suggests we're at that threshold already). And one in a million odds of trouble in itself isn't normally considered wildly irresponsible.
And one in a million odds of trouble in itself isn't normally considered wildly irresponsible.
For humans that are roughly capable of perhaps a few dozen significant actions per day, that may be true. But if that same rate of one in a million applies to a bot that can perform 10 millions actions in a day, you're looking at ten injuries per day. So perhaps you should be looking at mean time between failures rather than only the positive/negative outcome ratio?
If you look at the bot framework used here, it's actually outright kind. Weird thing to say, but natural language has registers, and now we're programming in natural language, and that's the register that was chosen.
And... these bots tend to only do a few dozen actions per day too, they're running on pi's and mac mini's and nucs and vps' and such. (And API credits add up besides)
It's just that last time I blinked there were 2 and a half million of them. I've blinked a few times since then, so it might be more now. I do think they're limited by operator resources. But when random friends start messaging me about why I don't have one yet, it gets weird.
Yes, we will certainly go that way, probably code already added to gcc has been developed through collaborative AI tools. Agree we don't call that "produced by AI".
I think compilers though are a rare case where large scale automated verification is possible. My guess is that starting from gcc, and all existing documentation on compilers, etc. and putting ridiculous amounts of compute into this problem will yield a compiler that significantly improves benchmarks.
Can't tell what this demo is streaming, looks like a static line but it's working hard on something. It can't seem to decide whether to display the top number in red or green either.
> Asking whether VB6 was out of work is the wrong question.
Asking whether you would like to work with a cowboy coded VB6 for low pay is a better one. The companies that have less cowboy coded apps are the companies everyone wants to work at. The more companies with cowboy coded apps, the harder it gets to get a job at a company with minimal cowboys imo.
VB6 apps haven't disappeared anymore than Cobol systems have.
"Third possibility negates the first two." It doesn't. Those 3 things don't need to all happen. Any of them alone is enough to significantly worsen the pay or quality of life of your average dev. And the 3 things don't even need to happen at the same time anyway.