Hacker Newsnew | past | comments | ask | show | jobs | submit | bushido's commentslogin

Agent teams are amazing, but burn a lot of tokens.

For folks without the max plans, the 2x promotion from anthrophic this month is a very good time to see if you'd benefit from having agent teams in your workflow.


I've seen this come up in a few comments, so I'm just adding it to a separate one in case it helps folks.

Something I have seen a lot of people talk about in the comments here, as well as do in practice within my company and friends, family, etc., is that they say something and then let Claude or GPT rephrase it to be added as a prompt that they'll then use.

In my experience, this will almost always bring about worse results than if you communicated directly with the LLM. I believe this happens because of a few reasons.

1. LLMs tend to do word inflation in that they'll create plausible-sounding prompts, but the words that they introduce have a higher propensity to create worse cookie-cutter results from other agents, coding assistants, writing assistants, or any other form that has been used.

2. By putting a layer in between what we're saying and what the LLMs interpret, we're not honing our ability to articulate and prompt better and wholly depend on the intermediary getting better or being able to interpret better, which does not translate well in practice.

3. Anecdotal, but in my case, when I was doing this myself, it was because I assumed I was harder to understand and not articulate enough to get good results. So I tried speeding up the results by trying to use an intermediary. What I learned, though, was training myself to be articulate and to not doubt myself was easier than getting results from the LLM interpreters.

of course with anything, ymmv.


I have no issues with A/B tests.

I do have an issue with the plan mode. And nine out of ten times, it is objectively terrible. The only benefit I've seen in the past from using plan mode is it remembers more information between compactions as compared to the vanilla - non-agent team workflow.

Interestingly, though, if you ask it to maintain a running document of what you're discussing in a markdown file and make it create an evergreen task at the top of its todo list which references the markdown file and instructs itself to read it on every compaction, you get much better results.


Huh, very much not my experience with plan mode. I use plan mode before almost anything more than truly trivial task because I've found it to be far more efficient. I want a chance to see and discuss what claude is planning to do before it races off and does the thing, because there are often different approaches and I only sometimes agree with the approach claude would decide on by itself.

Planning is great. It's plan mode that is unpredictable in how it discusses it and what it remembers from the discussion.

I still have discussions with the agents and agent team members. I just force it to save it in a document in the repo itself and refer back to the document. You can still do the nice parts of clearing context, which is available with plan mode, but you get much better control.

At all times, I make the agents work on my workflow, not try and create their own. This comes with a whole lot of trial and error, and real-life experience.

There are times when you need a tiger team made up of seniors. And others when you want to give a overzealous mid-level engineer who's fast a concrete plan to execute an important feature in a short amount of time.

I'm putting it in non-AI terms because what happens in real life pre-AI is very much what we need to replicate with AI to get the best results. Something which I would have given a bigger team to be done over two to eight sprints will get a different workflow with agent teams or agents than something which I would give a smaller tiger team or a single engineer.

They all need a plan. For me plan mode is insufficient 90% of the times.

I can appreciate that many people will not want to mess around with workflows as much as I enjoy doing.


> on every compaction

I've only hit the compaction limit a handful of times, and my experience degraded enough that I work quite hard to not hit it again.

One thing I like about the current implementation of plan mode is that it'll clear context -- so if I complete a plan, I can use that context to write the next plan without growing context without bound.


Agreed. The only time I don't clear context after a plan has been agreed on is when I'm doing a long series of relatively small but very related changes, such as back-and-forth tweaking when I don't yet know what I really want the final result to be until I've tried stuff out. In those cases, it has very rarely been useful to compact the context, but usually I don't get close.

I really like this too - having the previous plan and implementation in place to create the next plan, but then clearing context once that next plan exists feels like a great way to have exactly the right context at the right time.

I often do follow ups, that would have been short message replies before, as plans, just so I can clear context once it’s ready. I’m hitting the context limit much less often now too.


Before AI, as a head of product (who has always written code), I did this thing where when I was thinking through an idea or a product direction, I built the solution three or four times before I found the shape and direction that I liked. And once I liked it, I put it on a roadmap for one or more of my teams to execute on.

Candidly saying before AI is a little disingenuous, because since AI has gotten better in the last year at coding, my workflow has gone back to exactly what it was when I had a 40-person team reporting to me.

I still go through three, four iterations before a final direction is picked. It still takes me two, three weeks to think through an idea. Three things have changed.

1. When I think of a possible direction, a new version gets spun up within minutes to a couple of hours, usually in a single shot. 2. I can work through more big ideas which require some amount of coding-based ideation than I could previously. 3. And when a direction is decided on, the idea comes in to deliver the outcomes at a much quicker pace. Previously, it could have been 1 month of ideation + 2-8 sprints, now it's 2-4 weeks of ideation and 1-2 days to final delivery.

All in all, while I can see where the author is coming from, the grief has been different for me.

I've had a lot of good developers, product managers, product owners, and designers that have had the privilege of helping develop their skills in the past. That was the necessity of ensuring that we were developing talent who would then go on to produce good work on our teams.

And I'm at a stage now where a three-person team that I have can produce more than the 40 could, and I am likely never going to need to develop the skills the way I used to. The loss is not from coding, I thoroughly enjoy how that's evolved. The loss is from the white space around it.


The "Shall I implement it" behavior can go really really wrong with agent teams.

If you forget to tell a team who the builder is going to be and forget to give them a workflow on how they should proceed, what can often happen is the team members will ask if they can implement it, they will give each other confirmations, and they start editing code over each other.

Hilarious to watch, but also so frustrating.

aside: I love using agent teams, by the way. Extremely powerful if you know how to use them and set up the right guardrails. Complete game changer.


Huh. I’m missing out I guess. Is there a plugin you use for spinning them up? Heavy superpowers/CC user here.

I think they're talking about the Agent Teams feature in Claude Code: https://code.claude.com/docs/en/agent-teams

Cursor implemented something a while back where it started acting like how ChatGPT does when it's in its auto mode.

Essentially, choosing when it was going to use what model/reasoning effort on its own regardless of my preferences. Basically moved to dumber models while writing code in between things, producing some really bad results for me.

Anecdotal, but the reason I will never talk about Cursor is because I will never use it again. I have barred the use of Cursor at my company, It just does some random stuff at times, which is more egregious than I see from Codex or Claude.

ps. I know many other people who feel the same way about Cursor and other who love it. I'm just speaking for myself, though.

ps2. I hope they've fixed this behavior, but they lost my trust. And they're likely never winning it back.


Don’t use the “auto” model and you will be fine.

You just described their “auto” behavior, which I’m guessing uses grok.

Using it with specific models is great, though you can tell that Anthropic is subsidizing Claude Code as you watch your API costs more directly. Some day the subsidy will end. Enjoy it now!

And cursor debugging is 10x better, oh my god.

I have switched to 70% Claude Code, 10% Copilot code reviews (non anthropic model), and 20% Cursor and switch the models a bit (sometimes have them compete — get four to implement the same thing at the same time, then review their choices, maybe choose one, or just get a better idea of what to ask for and try again).


> get four to implement the same thing at the same time, then review their choices

Why would you do that to yourself? Reviewing 4 different solutions instead of 1 is 4 times the amount of work.


You wouldn't do that for everything. I'd reserve it for work with higher uncertainty, where you're not sure which path is best. Different model families can make very different choices.

Yes, this exactly.

Also, if there is a ui design then they could look wildly different.

I rarely use this feature, but when appropriate, it is fantastic to see the different approaches.


Same here. Auto mode is NOT ok. Sadly, smaller models cannot be trusted with access to Bash.

I love the framing here.

However, I think what a lot of people don't realize is the reason a lot of executives and business users are excited about AI and don't mind developers getting replaced is because product is already a black box.


They'll start minding when things start breaking. In the mean time I'll work on stuff AI is still not so great at.


Some of us need a paycheck and have to work on whatever LLM project the CEO demands an d if it fails the developer gets blamed.


I'm more and more convinced top execs are most likely to be advantageously replaced by LLM.

They navigate such complex decision spaces, full of compromises, tensions, political knots, that ultimately their important decisions are just made on gut feelings.

Replace the CEO with an LLM whose system prompt is carefully crafted and vetted by the board of directors, with some adequate digital twin of the company to project it's move, I'm sure it should maximize the interest of the shareholders much better.

Next up: apply the same recipe to government executive power. Couldn't be much worse than orange man.


The slow-burning problem is going to be adversarial input and poison data.


I've been doing the same. I don't mind SaaS subscription fees, but I often run into things where I need a niche feature that doesn't exist.

Incidentally, I ran into something like this with WhisperFlow last year. Used it for a few weeks, loved it, basically hardly typed for the month and just spoke to my system/terminal etc.

But, I ran into a unique challenge. Barking orders at my computed for 8 hours a day made me realize that I was changing how I communicated with people. Being nicer was easier to solve, but speech-to-text made me less articulate. I wasn't very articulate to begin with -- which is something that I have wanted to solve for a while.

So I built my own STT app, that works in a similar way as whisperflow, with a few notable exceptions. Minor: it has dictionaries, snippets etc on a per app/website basis. Major: most notably it has rubrics on how I want to communicate in different contexts, ex: Biz Exec over email, Principle engineer in my ide/terminal. etc

And scores me on areas like conciseness, flow, logical flow/ease to follow, clarity etc. every time I say anything. 10 weeks in I'm noticeably more articulate than I've ever been.


This feels like a far better reason to code your own; when your use case is just a bit too niche to ever be prioritized but you otherwise need a similar tool.


Very surprised to learn that this is real https://www.volvocars.com/us/l/osd-tourist/

Pretty cool. Lots more info on reddit threads.


Audi, BMW and Mercedes did this as well until a few years ago.

https://www.capitalone.com/cars/learn/finding-the-right-car/...


looks like you have to pay VAT?


VAT is only levied if it doesn't get exported within a certain amount of time (6 months from the scheduled delivery date).

I knew someone who tooled around Europe for a month before dropping it off to be shipped to her without having VAT incurred (though it was a couple decades ago).


I'm building an ERP to encapsulate the whole "customer" journey. I'm building it to be a business operating system of sorts with a goal of creating clear line-of-sight visibility for all activities along journey's like lead-gen to churn (but in a variety of settings).

The goal is make it easier for organizations to work with external parties that affect finances (customers, investors, vendors, etc.).

The idea was born out of personal frustration that I've faced in a variety of leadership roles in organizations, that lead to wasted effort, slower decision making, bad decisions made with equally unhygienic data.

I've solved this successfully in the past form of internal tools and a data governance layer (data warehouse with much more authority).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: