More

jpau · 2026-02-25T17:47:47 1772041667

> For venue recommendations [...] we do not rely purely on the language model. We embed both user requirements and venues into vector representations and retrieve candidates using similarity search. Hard constraints such as capacity and dates are applied first, and results are ranked before being presented.

Huh this surprised me as a forgone opportunity.

I heard second-hand about the process for organizing our last offsite. Searching for venues was not the time-consuming part.

The time-consuming part was actually engaging with the venues to confirm specific details not available online. Our teammate who did this engaged with _hundreds_ of venues. It was a lot of work on their part ... and probably not the most fun part of their job.

That seems like an ideal agent scenario?

vincentalbouy · 2026-02-25T17:58:10 1772042290

You are right, venue recommendation is only the first step or the process.

What is time consuming is the communication with the venue to agree on "terms" , this is exactly why if you click on "Request Quote" you will have a real quote process with the venue that will share all the details and cost estimate with the client , we also offer to talk directly with the venue manager to talk about the final details and close the deal, that is where the value is at --> end to end booking process (not just aggregating results)

jpau · 2026-02-05T20:34:38 1770323678

Interesting that this was released without a prior GPT-5.3 release. I wonder if that means we won't see a GPT-5.3?

jpau · 2025-11-27T00:27:16 1764203236

Hey we're also a Vertex tuning customer in a similar spot. We're seeing other capacity issues, although not a leap in latency. Can you DM me? I'd love to trade notes. https://x.com/hellofromjames

deaux · 2025-11-30T05:48:39 1764481719

Not a verified X user, but happy to exchange here or elsewhere. The latency leap is still the same for us. We're on us-west1 but reports are that it's similar on at least us-central1 if not elsewhere. We simply can't use the finetuned models in prod any more due to this, but whenever we run our automated tests with them, including today, the latency is still there. We haven't seen issues on non-finetuned models.

jpau · 2025-11-15T00:39:42 1763167182

I love Cerebras. I also love that they've started to scale rate limits to useful levels (which is relatively new).

I still don't know how long they'll support our chosen model.

On Oct 22 I got an email saying that

```

- qwen-3-coder-480b will be available until Nov 5, 2025

- qwen-3-235b-a22b-thinking-2507 will be available until Nov 14, 2025

```

That's not a lot of notice!

I don't want to spend all my time benchmarking new models for features I already built. I don't want my users' experience to be disturbed every few months.

jpau · 2025-09-24T20:20:33 1758745233

> A URL shortener that runs a lightweight model (gemini-1.5-flash)

I think gemini-1.5-flash is EOL'd from tomorrow (Sep 25th) https://cloud.google.com/vertex-ai/generative-ai/docs/learn/...

RIP gemini-1.5

michaelstewart · 2025-09-24T21:13:39 1758748419

Oooh, that it very good to know- thank you.

That probably explains why it was so much faster than others in my testing (everyone else had migrated off of it)

sailfast · 2025-09-24T23:44:27 1758757467

That is correct. The speed at which these are getting deprecated is a constant pain in the rear.

jpau · 2025-08-12T20:57:32 1755032252

Google[1] also has a "long context" pricing structure. OpenAI may be considering offering similar since they do not offer their priority processing SLAs[2] for context >128K.

[1] https://cloud.google.com/vertex-ai/generative-ai/pricing

[2] https://openai.com/api-priority-processing/

jpau · 2025-05-22T20:23:11 1747945391

Interesting!

Is there anything to read into needing twice the "Avg Attempts", or is this column relatively uninteresting in the overall context of the bench?

_peregrine_ · 2025-05-22T21:10:55 1747948255

No it's definitely interesting. It suggests that Opus 4 actually failed to write proper syntax on the first attempt, but given feedback it absolutely nailed the 2nd attempt. My takeaway is that this is great for peer-coding workflows - less "FIX IT CLAUDE"

jpau · 2025-05-22T20:18:39 1747945119

Seems to be a nod to each size being treated as their own product.

Claude 3 arrived as a family (Haiku, Sonnet, Opus), but no release since has included all three sizes.

A release of "claude-3-7-sonnet" alone seems incomplete without Haiku/Opus, when perhaps Sonnet is has its own development roadmap (claude-sonnet-*).

jpau · 2025-04-07T19:09:35 1744052975

Sorry to hear the challenge.

You and your friends should email me with your resume and anything you're proud to have built. I'll extend that to any MIT senior/recent grad who wants to discuss moving to SF and helping us apply LLMs to build product features that solve interesting customer problems.

I'm at james.peterson@fathom.video. Include "[responding to HN thread 43614795]" in the title. I'd love to chat.

jpau · on March 25, 2025

I am grateful for GCP's quotas that help us prevent similar own-goals.

While this specific error is something we know to avoid, I'm sure quotas have helped us avoid the pain of other errors. So I'm somewhat sympathetic.

I think it's important to read the language of and judgements in the post in the context of someone who just got a large unexpected bill (expensive lesson).