Hacker Newsnew | past | comments | ask | show | jobs | submit | trjordan's commentslogin

I'd offer a different approach: think about how you're going to validate. An only-slightly-paraphrased Claude conversation I had yesterday:

> me: I want our agent to know how to invoke skills.

> Claude: [...]

> Claude: Done. That's the whole change. No MCP config, no new env vars, no caller changes needed.

> me: ok, test it.

> Claude: This is a big undertaking.

That's the hard part, right? Maybe Claude will come back with questions, or you'll have to kick it a few times. But eventually, it'll declare "I fixed the bug!" or summarize that the feature is implemented. Then what?

I get a ton of leverage figuring this out what I need to see to trust the code. I work on that. Figure out if there's a script you can write that'll exercise everything and give you feedback (2nd claude session!). Set up your dev env so playwright will Just Work and you can ask Claude to click around and give you screenshots of it all working. Grep a bunch and make yourself a list of stuff to review, to make sure it didn't miss anything.


Amen. Making the checking painless and easy to do is a major boon. There's a spectrum of "checking is easy": the compiler telling you the code doesn't compile is the easiest, but doesn't capture "is this the program I want". Some checks like that are inherently not mechanically checkable and some sort of written "testing protocol" is necessary.

I utterly love Hammerspoon.

It's fun to combine with qmk [0], which gives you a bunch more options for hotkeys on your keyboard via layers. I've ended up with a layer where half the keyboard is Hammerspoon shortcuts directly to apps (e.g. go to Slack, to Chrome, etc.) and half of it is in-app shortcuts (like putting cmd-number on the home row, for directly addressing chrome tabs).

Between this and one of the tiling window manager-adjacent tools (I use Sizeup), I can do all my OS-level navigation directly. "Oh I want to go to Slack and go to this DM" is a few keystrokes away, and not dependent on what else I was doing.

[0] https://qmk.fm/


My QMK Tmux "layer" is still one of my favourite customisations. Prepends Ctrl-B to everything I type.



Yeah, but now I wouldn't touch anything from that company with a ten foot pole, even if they made the best Slack replacement ever.


Considering their Palantir partnership, I'm not sure I'd touch an Anthropic-designed slack either.


Also true! The most important thing is that the NewSlacks commit to interoperability. I think Anthropic has a special opportunity to lead the way here, because they have a track record of standing by their principles to an extraordinary degree.


Why on earth would Anthropic commit to interoperability?

That is the company that doesn't interoperate with the standard LLM APIs that OpenAI developed, which everyone else in the industry has adopted and uses. Whether OpenAI's APIs are great or perfect or not, they are the standard that the industry has settled on.

That is the same company that refuses to add support for AGENTS.md that everyone else in the industry uses, despite over 3000 upvotes: https://github.com/anthropics/claude-code/issues/6235

Anthropic's Claude Code is also one of the only agentic coding CLI tools that isn't open source.

I'm not sure which principles you think Anthropic stands by... but interoperability is not one of their strong suits, from what I've seen.


You must be the only one that remembers this because the rest of the comments are dumping on the idea. I don't think it's such a bad one. Presumably its easier for their agents to knock out than a web browser or a compiler.


The push for simplicity can't be at the time of recognition. It has to be during the building, so that by the time the thing gets built, it's the simplest thing that met the need.

Can you actually imagine a promo committee evaluating the technical choices? "Ok, this looks pretty complex. I feel like you could have just used a DB table, though? Denied."

Absolutely not! That discussion happens in RFCs, in architecture reviews, in hallway meetings, and a dozen other places.

If you want simplicity, you need to encourage mentorship, peer review, and meaningful participation in consensus-building. You need to reward team-first behavior and discourage lone-wolf brilliance. This is primarily a leadership job, but everybody contributes.

But that's harder than complaining that everything seems complicated.


>Can you actually imagine a promo committee evaluating the technical choices? "Ok, this looks pretty complex. I feel like you could have just used a DB table, though? Denied."

A committee with no skin in the game, who knows? But a manager who actually needs stuff done, absolutely.


"Red lines" does not mean some philosophical line they will not cross.

"Redlines" are edits to a contract, sent by lawyers to the other party they're negotiating with. They show up in Word's Track Changes mode as red strikethrough for deleted content.

They are negotiating the specifics of a contract, and Anthropic's contract was overly limiting to the DoD, whereas OpenAI's was not.


That’s not how the term is being used here.

In this case “red lines” as a term is being used as “lines than can not be crossed”

Anthropic wanted guardrails on how their tech was used. DOD was saying that wasn’t acceptable.


Having recently set up sentry, at least one of the ways they use this is to auto-configure uptime monitoring.

Once they know what hosts you run, it'll ping that hostname periodically. If it stays up and stable for a couple days, you'll get an alert in product: "Set up uptime monitoring on <hostname>?"

Whether you think this is valid, useful, acceptable, etc. is left as an exercise to the reader.


Expansion opportunities


1. I absolutely agree there's a bubble. Everybody is shipping a code review agent.

2. What on earth is this defense of their product? I could see so many arguments for why their code reviewer is the best, and this contains none of them.

More broadly, though, if you've gotten to the point where you're relying on AI code review to catch bugs, you've lost the plot.

The point of a PR is to share knowledge and to catch structural gaps. Bug-finding is a bonus. Catching bugs, automated self-review, structuring your code to be sensible: that's _your_ job. Write the code to be as sensible as possible, either by yourself or with an AI. Get the review because you work on a team, not in a vacuum.


> More broadly, though, if you've gotten to the point where you're relying on AI code review to catch bugs, you've lost the plot.

> The point of a PR is to share knowledge and to catch structural gaps.

Well, it was to share knowledge and to catch structural gaps.

Now you have an idea, for better or for worse, that software needs to be developed AI-first. That's great for the creation of new code but as we all know, it's almost guaranteed that you'll get some bad output from the AI that you used to generate the code, and since it can generate code very fast, you have a lot of it to go through, especially if you're working on a monorepo that wasn't architected particularly well when it was written years ago.

PRs seem like an almost natural place to do this. The alternative is the industry finding a more appropriate place to do this sort of thing in the SDLC, which is gonna take time, seeing as how agentic loop software development is so new.


2. There is plenty of evidence for this elsewhere on the site, and we do encourage people to try it because like with a lot of AI tools, YMMV.

You're totally right that PR reviews go a lot farther than catching issues and enforcing standard. Knowledge sharing is a very important part of it. However, there are processes you can create to enable better knowledge sharing and let AI handle the issue-catching (maybe not fully yet, but in time). Blocking code from merging because knowledge isn't shared yet seems unnecessary.


> 2. What on earth is this defense of their product?

i think the distribution channel is the only defensive moat in low-to-mid-complexity fast-to-implement features like code-review agents. So in case of linear and cursor-bugbot it make a lot of sense. I wonder when Github/Gitlab/Atlassian or Xcode will release their own review agent.


This is going to sound sarcastic, but I mean this fully: why haven't they merged that PR.

The implied future here is _unreal cool_. Swarms of coding agents that can build anything, with little oversight. Long-running projects that converge on high-quality, complex projects.

But the examples feel thin. Web browsers, Excel, and Windows 7 exist, and they specifically exist in the LLM's training sets. The closest to real code is what they've done with Cursor's codebase .... but it's not merged yet.

I don't want to say, call me when it's merged. But I'm not worried about agents ability to produce millions of lines of code. I'm worried about their ability to intersect with the humans in the real world, both as users of that code and developers who want to build on top of it.


> This is going to sound sarcastic, but I mean this fully: why haven't they merged that PR.

I would go even further, why have they not created at least one less complex project that is working and ready to be checked out? To me it sounds like having a carrot dangle in front of the face of VC investors: 'Look, we are almost there to replace legions of software developers! Imagine the market size and potential cost reductions for companies.'

LLMs are definitely an exciting new tool and they are going to change a lot. But are they worth $B for everything being stamped 'AI'? The future will tell. Looking back the dotcom boom hype felt exactly the same.

The difference with the dotcom boom is that at the time there was a lot more optimism to build a better future. The AI gold rush seems to be focused on getting giga-rich while fscking the bigger part of humanity.


>> why haven't they merged that PR.

because it is absolutely impossible to review that code and there is gazillion issues there.

The only way it can get merged is YOLO and then fix issues for months in prod which kinda defeats the purpose and brings gains close to zero.


On the other hand, finding fixing issues for months is still training data


> Long-running projects that converge on high-quality, complex projects

In my experience agents don't converge on anything. They diverge into low-quality monstrosities which at some point become entirely unusable.


Yeah, I don't think they're built for that either, you need a human to steer the "convergtion", otherwise they indeed end up building monstrosities.


> Web browsers, Excel, and Windows 7 exist, and they specifically exist in the LLM's training sets.

There's just a bit over 3 browsers, 1 serious excel-like and small part of windows user side. That's really not enough for training for replicating those specific tasks.


> Long-running projects that converge

This is how I think about it. I care about asymptotics. What initial conditions (model(s) x workflow/harness x input text artefacts) causes convergence to the best steady state? The number of lines of code doesn't have to grow, it could also shrink. It's about the best output.


Pretty much everything exists in the training sets. All non-research software is just a mishmash of various standard modules and algorithms.


Not everything, only code-bases of existing (open-source?) applications.

But what would be the point of re-creating existing applications? It would be useful if you can produce a better version of those applications. But the point in this experiment was to produce something "from scratch" I think. Impressive yes, but is it useful?

A more practically useful task would be for Mozilla Foundation and others to ask AI to fix all bugs in their application(s). And perhaps they are trying to do that, let's wait and see.


You have to be careful which codebase to try this on. I have a feeling if someone unleashed agents on the Linux kernel to fix bugs it'd lead to a ban on agents there


Re-creating closed source applications as open source would have a clear benefit because people could use those applications in a bunch of new ways. (implied: same quality bar)


I'm interested in why Claude loses it's mind here,

but also, getting shut down for safety reasons seems entirely foreseeable when the initial request is "how do I make a bomb?"


That wasn't the request, that's how Claude understood the Armenian when it short-circuited.


Does Google also not handle this well?

Copy-pasted from the chat: https://www.google.com/search?q=translate+%D5%AB%D5%B6%D5%B9...


There's something about this that's unsatisfying to me. Like it's just a trivia trick.

My first read of this was "this seems impossible." You're asked to move bits around without any working space, because you're not allowed to allocate memory. I guess you could interpret this pedantically in C/C++ land and decide that they mean no additional usage of the heap, so there's other places (registers, stack, etc.) to store bits. The title is "in constant memory" so I guess I'm allowed some constant memory, which is vaguely at odds with "can you do this without allocating additional memory?" in the text.

But even with that constraint ... std::rotate allocates memory! It'll throw std::bad_alloc when it can't. It's not using it for the core algorithm (... which only puts values on the heap ... which I guess is not memory ...), but that function can 100% allocate new memory in the right conditions.

It's cool you can do this simply with a couple rotates, but it feels like a party trick.


To be fair, it originates from a time when memory was tighter. Is discussed with some motivating text in Programming Pearls. I can't remember the context, but I think it was in a text editor. I can look it up, if folks want some of that context here.


I did something similar back in the day to support block-move for an editor running on a memory constrained 8-bit micro (BBC Micro). It had to be done in-place since there was no guarantee you'd have enough spare memory to use a temporary buffer, and also more efficient to move each byte once rather than twice (in/out of temp buffer).


Also useful for cache locality, a more recent trend. But I guess that's just another slighlty diff case of tight mem; this time in the cache rather than RAM generally.


The problem seems less arbitrary if the chunks being rotated are large enough. Implicit in the problem is that any method that would require additional memory to be allocated would probably require memory proportional to the sizes of stuff being swapped. That could be unmanageable.

As for whether std::rotate() uses allocations, I can't say without looking. But I know it could be implemented without allocations. Maybe it's optimal in practice to use extra space. I don't think a method involving reversal of items is generally going to be the fastest. It might be the only practical one in some cases or else better for other reasons.


No - std::rotate is just doing this with in-place swaps.

Say you have "A1 A2 B1" and want to rotate (swap) adjacent blocks A1-A2 and B1, where WLOG the smaller of these is B1, and A1 is same size as B1.

What you do is first swap B1 with A1 (putting B1 into it's final place).

B1 A2 A1

Now recurse to swap A2 and A1, giving the final result:

B1 A1 A2

Swapping same-size blocks (which is what this algorithm always chooses to do) is easy since you can just iterate though both swapping corresponding pairs of elements. Each block only gets moved once since it gets put into it's final place.


You are thinking of std::swap, std::rotate does throw bad_alloc


I see it says that it may throw bad_alloc, but it's not clear why, since the algorithm itself (e.g see "Possible implementation" below) can easily be done in-place.

https://en.cppreference.com/w/cpp/algorithm/rotate.html

I'm wondering if the bad_alloc might be because a single temporary element (of whatever type the iterators point to) is going to be needed to swap each pair of elements, or maybe to allow for an inefficient implementation that chose not to do it in-place?


> But even with that constraint ... std::rotate allocates memory! It'll throw std::bad_alloc when it can't.

This feels kinda crazy. Is there a reason why this is the case?


That's only for the parallel overload. The ordinary sequential overload doesn't allocate: the only three ordinary STL algorithms that allocate are stable_sort, stable_partition, and (ironically) inplace_merge.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: