> For venue recommendations [...] we do not rely purely on the language model. We embed both user requirements and venues into vector representations and retrieve candidates using similarity search. Hard constraints such as capacity and dates are applied first, and results are ranked before being presented.
Huh this surprised me as a forgone opportunity.
I heard second-hand about the process for organizing our last offsite. Searching for venues was not the time-consuming part.
The time-consuming part was actually engaging with the venues to confirm specific details not available online. Our teammate who did this engaged with _hundreds_ of venues. It was a lot of work on their part ... and probably not the most fun part of their job.
You are right, venue recommendation is only the first step or the process.
What is time consuming is the communication with the venue to agree on "terms" , this is exactly why if you click on "Request Quote" you will have a real quote process with the venue that will share all the details and cost estimate with the client , we also offer to talk directly with the venue manager to talk about the final details and close the deal, that is where the value is at --> end to end booking process (not just aggregating results)
Hey we're also a Vertex tuning customer in a similar spot. We're seeing other capacity issues, although not a leap in latency. Can you DM me? I'd love to trade notes. https://x.com/hellofromjames
Not a verified X user, but happy to exchange here or elsewhere. The latency leap is still the same for us. We're on us-west1 but reports are that it's similar on at least us-central1 if not elsewhere. We simply can't use the finetuned models in prod any more due to this, but whenever we run our automated tests with them, including today, the latency is still there. We haven't seen issues on non-finetuned models.
I love Cerebras. I also love that they've started to scale rate limits to useful levels (which is relatively new).
I still don't know how long they'll support our chosen model.
On Oct 22 I got an email saying that
```
- qwen-3-coder-480b will be available until Nov 5, 2025
- qwen-3-235b-a22b-thinking-2507 will be available until Nov 14, 2025
```
That's not a lot of notice!
I don't want to spend all my time benchmarking new models for features I already built. I don't want my users' experience to be disturbed every few months.
Google[1] also has a "long context" pricing structure. OpenAI may be considering offering similar since they do not offer their priority processing SLAs[2] for context >128K.
No it's definitely interesting. It suggests that Opus 4 actually failed to write proper syntax on the first attempt, but given feedback it absolutely nailed the 2nd attempt. My takeaway is that this is great for peer-coding workflows - less "FIX IT CLAUDE"
You and your friends should email me with your resume and anything you're proud to have built. I'll extend that to any MIT senior/recent grad who wants to discuss moving to SF and helping us apply LLMs to build product features that solve interesting customer problems.
I'm at james.peterson@fathom.video. Include "[responding to HN thread 43614795]" in the title. I'd love to chat.
I am grateful for GCP's quotas that help us prevent similar own-goals.
While this specific error is something we know to avoid, I'm sure quotas have helped us avoid the pain of other errors. So I'm somewhat sympathetic.
I think it's important to read the language of and judgements in the post in the context of someone who just got a large unexpected bill (expensive lesson).
Huh this surprised me as a forgone opportunity.
I heard second-hand about the process for organizing our last offsite. Searching for venues was not the time-consuming part.
The time-consuming part was actually engaging with the venues to confirm specific details not available online. Our teammate who did this engaged with _hundreds_ of venues. It was a lot of work on their part ... and probably not the most fun part of their job.
That seems like an ideal agent scenario?