Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This looks unbelievably simple, on the lines of why hasn't it been done before.

So you ping 10.3.4.16 and your host automatically 'knows' to just send it to 17.16.4.16 where lying in wait, the receiving host simply forwards it to 10.3.4.16. I like it.

This is a vexing problem for containers and even VM networking. If they are in a NAT you need to create a mesh of tunnels across hosts, or you create a flat network so they are all on the same subnet. But you can't do this for containers on the cloud with a single IP and limited control of the networking layer.

Current solutions include L2 overlays, L3 overlays, a big mishmash of GRE and other type of tunnels, or VXLAN multicast unavailable in most cloud networks, or proprietary unicast implementations. It's a big hassle.

Ubuntu have taken a simple approach, no per node database to maintain state and uses commonly used networking tools. And more importantly it seems fast. And it's here and now. That 6gbps suggests this does not compromise performance like a lot of other solutions tend to do. It won't solve all multi-host container networking use cases but will address many.



You can use any method to program the VXLAN forwarding table, you don't need to use multicast.

This can even be done on the command line using iproute2 utilities: https://www.kernel.org/doc/Documentation/networking/vxlan.tx...

Though you should probably use netlink to do it programatically. Personally I like to combine netlink + Zookeeper or similar to trigger edge updates via watches.


Are you referring to the fdb tables, I tried that some months ago but it didn't seem to work. Maybe its changed now. I will give it a shot. Any tips?

I remember seeing a patch floating around that added support for multiple default destinations in VXLAN unicast but I think some objections were raised and it's not made it through. At least it's not there in 4.1-rc7. That would be quite nice to have.

http://www.spinics.net/lists/netdev/msg238046.html#.VNs3rIdZ...


Oh, multiple default destinations - that would be very cool!

Right now I am using netlink to manage FDB entries, last time I tried iproute2 utility worked too...

The only tricky thing about doing it with netlink is that the FDB uses the same API as the ARP table. Specifically RRTM_NEWNEIGH/RTM_DELNEIGH/RTM_GETNEIGH, apart from that it's pretty simple though.


Yes, the only thing that's lost is live migration of IPs between hosts. Which may or may not be a big thing for containers, depending on things are clustered.


Why does NAT require a mesh of tunnels? Are you trying to separate the containers onto securely-separate networks?

Why isn't DHCP used? What does it not do that this service does?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: