Some of the buffer bloat problem is due to SO_SENDBUF being statically set to so...

natermer · on July 22, 2012

Well exposing it per application kinda defeats a bit of the purpose behind it. The point is that you get rid of the buffer so you get packet loss when the link to the other peer(s) is(are) saturated and allow the TCP congestion algorithms to work properly.

Even in a high latency high bandwidth situation having huge buffers is self-defeating. Anybody that has used Bittorrent and assumed their ISP was doing some sort of throttling were you have weird latency spikes and reduced performance... this is a direct result of buffer bloat in the majority of cases.

Large buffers only help in the case of a single TCP connection using up as much as the bandwidth as possible, which helps network routers and such things look good in benchmarks.

What you are talking about is some sort of Quality of Service type thing. Probably most usefully expressed at the edge of networks for prioritizing traffic and at the ISP level so they can route traffic through different internet links based on requirements.

ISPs have to deal with choosing different links to other networks and what it costs. They can do choose stuff like use main backbones versus secondary links and you get different trade-offs based cost versus latency and things of that nature. So ideally there should be some sort of flag you can set in the TCP packet that would indicate latency importance or some such thing. Or just making internet routing equipment application-protocol-aware.

Peaker · on July 22, 2012

By "expose it to the application", I mean in the flow-control sense. That send() (or select/epoll/etc) will block until there is room in the TCP window, rather than in the pre-determined buffer which will always be too small or too big.

The way it works now, the kernel is basically forcing applications to have buffer-bloat (fill a 0.5MB socket buffer), or auto-detect RTT or manually select a proper buffer size. All are bad options.

Also, in high-latency-high-bandwidth situations, the default 0.5MB buffer will simply fail to make use of the available bandwidth, so increasing the buffer size does not defeat the purpose. Latency spikes are a different situation, there are cases of constant high latency (e.g: inter-continental 1Gbps links).

halayli · on July 22, 2012

> So ideally there should be some sort of flag you can set in the TCP packet that would indicate latency importance or some such thing.

Such field exist in the IP header and is called TOS (Type of Service).

noselasd · on July 23, 2012

Regarding bittorrent, I'm wondering if that's not just as much the fault of crappy NAT modems. Around here everyone gets a Zyxel, and bittorrent totally chokes it periodically. The hash table used for NAT grows, the CPU can't handle it, to the point you can't even log in to the modem, and new TCP sessions get dropped until the NAT clears out the old TCP connections. (With a proper network device, this shouldn't be an issue, but it's one I see all the time with the cheap modems we get from the ISPs)

noselasd · on July 23, 2012

The socket send buffer from applications is not very relevant to buffer bloat, they do not cause retransmissions, nor do they contribute much visible latency.

Peaker · on July 23, 2012

0.5MB of buffer contributes a lot to latency when your link is slow.

Assuming a 1Mbit link, it takes 4 seconds of latency to send the default SO_SENDBUF.

noselasd · on July 23, 2012

True. But if you have 0.5MB data to send right now, it will take that long regardless of whether it is queued in the socket buffer or your application. Applications that need to care about this are typically not sitting on top of TCP, and would usually need to control the send buffer size anyway.

Peaker · on July 23, 2012

If you have 0.5MB data to send right now, you could queue it all in your application layer, then you could still decide to cancel it or schedule your sends based on your own priorities.

I agree that applications that care about this don't use TCP, but one of the primary reasons for this is exactly this problem: that you don't get to send at the edge of the TCP window. There are other reasons, of course, each of which is fixable (and should be fixed!)