Technically it doesn't, it's just really hard to implement leastconn correctly. ...

hinkley · 2025-08-02T15:57:46 1754150266

When the cost of different requests varies widely it’s difficult to get it right. When we rolled out docker I saw a regression in p95 time. I countered this by doubling our instance size and halving the count, which made the number of processes per machine slightly more instead of way less than the number of machines. I reasoned that the local load balancing would be a bit fairer and that proved out in the results.

contravariant · 2025-08-02T22:01:05 1754172065

I'm not 100% sure if it's just load balancing. It would depend on the details of the setup but that situation also allows you to throw more resources at each request.

I mean obviously there is a point where splitting up the instances doesn't help because you're just leaving more instances completely idle, or with too little resources to be helpful.