More

fourthark · 2026-02-22T02:04:21 1771725861

This is exactly what the article is about. The tradeoff is that you have to throughly review the plans and iterate on them, which is tiring. But the LLM will write good code faster than you, if you tell it what good code is.

reg_dunlop · 2026-02-22T03:02:48 1771729368

Exactly; the original commenter seems determined to write-off AI as "just not as good as me".

The original article is, to me, seemingly not that novel. Not because it's a trite example, but because I've begun to experience massive gains from following the same basic premise as the article. And I can't believe there's others who aren't using like this.

I iterate the plan until it's seemingly deterministic, then I strip the plan of implementation, and re-write it following a TDD approach. Then I read all specs, and generate all the code to red->green the tests.

If this commenter is too good for that, then it's that attitude that'll keep him stuck. I already feel like my projects backlog is achievable, this year.

fourthark · 2026-02-22T03:30:10 1771731010

Strongly agree about the deterministic part. Even more important than a good design, the plan must not show any doubt, whether it's in the form of open questions or weasel words. 95% of the time those vague words mean I didn't think something through, and it will do something hideous in order to make the plan work

Degorath · 2026-02-22T11:46:56 1771760816

My experience has so far been similar to the root commenter - at the stage where you need to have a long cycle with planning it's just slower than doing the writing + theory building on my own.

It's an okay mental energy saver for simpler things, but for me the self review in an actual production code context is much more draining than writing is.

I guess we're seeing the split of people for whom reviewing is easy and writing is difficult and vice versa.

fourthark · 2026-02-19T12:28:46 1771504126

> it is weakest beneath the frozen continent of Antarctica *after accounting for Earth's rotation*.

fourthark · 2026-02-18T03:40:46 1771386046

Why do you find it useless for legacy code? I find I have to give it plenty of context but it does pretty well on legacy code.

And Ask DeepWiki is a great shortcut for finding the right context… Granted this is open source and DW is free.

Is it the specific nature of your work?

fourthark · 2026-02-14T02:30:25 1771036225

What is your argument? There are a lot of bots, therefore humans are no longer in charge?

Kim_Bruning · 2026-02-14T03:36:08 1771040168

So, here's roughly what I think happened: https://news.ycombinator.com/item?id=47003818

All the separate pieces seem to be working in fairly mundane and intended ways, but out in the wild they came together in unexpected ways. Which shouldn't be surprising if you have a million of these things out there. There are going to be more incidents for sure.

Theoretically we could even still try banning AI agents; but realistically I don't think we can put that genie back into the bottle.

Nor can we legislate strict 1:1 liability. The situation is already more complicated than that.

Like with cars, I think we're going to need to come up with lessons learned, best practices, then safety regulations, and ultimately probably laws.

At the rate this is going... likely by this summer.

Kim_Bruning · 2026-02-14T16:42:00 1771087320

I'm updating my thinking. Where do we put the threshold for malice, and for negligence?

Because right now, a one in a million chance of things going wrong (this month) leads to a prediction of 2-3 incidents already. (anecdata across the HN discussions we've had suggests we're at that threshold already). And one in a million odds of trouble in itself isn't normally considered wildly irresponsible.

tremon · 2026-02-15T00:43:02 1771116182

And one in a million odds of trouble in itself isn't normally considered wildly irresponsible.

For humans that are roughly capable of perhaps a few dozen significant actions per day, that may be true. But if that same rate of one in a million applies to a bot that can perform 10 millions actions in a day, you're looking at ten injuries per day. So perhaps you should be looking at mean time between failures rather than only the positive/negative outcome ratio?

Kim_Bruning · 2026-02-15T01:07:39 1771117659

If you look at the bot framework used here, it's actually outright kind. Weird thing to say, but natural language has registers, and now we're programming in natural language, and that's the register that was chosen.

And... these bots tend to only do a few dozen actions per day too, they're running on pi's and mac mini's and nucs and vps' and such. (And API credits add up besides)

It's just that last time I blinked there were 2 and a half million of them. I've blinked a few times since then, so it might be more now. I do think they're limited by operator resources. But when random friends start messaging me about why I don't have one yet, it gets weird.

Kim_Bruning · 2026-02-15T15:08:57 1771168137

https://news.ycombinator.com/item?id=47009949 Now deployable in 'one click'. What a time to be alive.

fourthark · 2026-02-09T08:07:37 1770624457

But only if there is a competent compiler engineer running the AI, reviewing specs, and providing decent design goals.

Yes it will be far easier than if they did it without AI, but should we really call it “produced by AI” at that point?

nvrmnd · 2026-02-09T08:31:22 1770625882

Yes, we will certainly go that way, probably code already added to gcc has been developed through collaborative AI tools. Agree we don't call that "produced by AI".

I think compilers though are a rare case where large scale automated verification is possible. My guess is that starting from gcc, and all existing documentation on compilers, etc. and putting ridiculous amounts of compute into this problem will yield a compiler that significantly improves benchmarks.

fourthark · 2026-01-26T05:47:27 1769406447

“This is key!”

fourthark · 2026-01-22T03:08:16 1769051296

Can't tell what this demo is streaming, looks like a static line but it's working hard on something. It can't seem to decide whether to display the top number in red or green either.

fourthark · 2026-01-22T00:13:11 1769040791

Third possibility negates the first two.

Also, did VB6 put anyone out of work?

bossyTeacher · 2026-01-22T12:22:47 1769084567

> Asking whether VB6 was out of work is the wrong question.

Asking whether you would like to work with a cowboy coded VB6 for low pay is a better one. The companies that have less cowboy coded apps are the companies everyone wants to work at. The more companies with cowboy coded apps, the harder it gets to get a job at a company with minimal cowboys imo.

VB6 apps haven't disappeared anymore than Cobol systems have.

"Third possibility negates the first two." It doesn't. Those 3 things don't need to all happen. Any of them alone is enough to significantly worsen the pay or quality of life of your average dev. And the 3 things don't even need to happen at the same time anyway.

fourthark · 2026-01-21T23:11:38 1769037098

Maybe you have JS disabled? I see it flash from Jan 22 to Jan 21. :-)

fourthark · 2026-01-14T01:12:20 1768353140

You can have it write the specs and tests, too, and review and refine them much faster than you could write them.