My understanding is that they basically "fast flux" IPs to funnel traffic for targeted attack to a specific data center. So, while you normally may be sharing IPs, if an enterprise customer's website example.com starts getting attacked they will put it on dedicated IPs, then broadcast those IPs from one or two data centers. They will then reroute all other enterprise traffic away from those data centers, thus minimizing the attack effect on other customers. If these websites were all on the same IP, it would be impossible to distribute traffic selectively between data centers like this.
Another thing they can do is use anycast to load balance across data centers. So, if a data center rather than a website is a target - the attackers will need to know which IPs to attack. They can start flooding the broadcasted IPs from a particular route. However, if this happens then hypothetically Cloudflare could just stop broadcasting the IPs at this particular data center, re-broadcast them at all the surrounding data centers, and basically spread out the attack load across multiple sites. If the attackers change the IPs that they target based on new routes, then Cloudflare can continue fast-fluxing the IPs every 5 minutes and mitigate the attack.
It's pretty cool use of BGP and anycast, but being able to change IPs of website and where they are broadcasted in real-time is core to Cloudflare's security.
Thanks for this comment. I guess, along with jgrahamc's sibling comment, you have to make a routing decision based on (source, port) at most if you have a fixed IP, since HTTPS ports are stupidly fixed. That is 32+16 bits of info at most, so an ethernet MACs worth. So now I can clarify my question as follows: with X bits of data, what is the present state-of-the-art latency wrt to routing T Gbps of traffic. And it's not just that, you have to have good latency for updating that routing table.
Any research on the real entropy of (source,port) entropy on the Internet? The are also real issues like the distribution of (source, port) is hardly uniform, and is especially nasty when undergoing an attack, i.e. you want to manage latency based the both the distribution and authenticity of traffic.
This is a very interesting mathematical problem. I have to work on expressing it a bit better before I can hope of formulating a solution, but yes I can totally see now how leveraging BGP, anycast, and DNS TTL are all knobs to heuristically solve this problem, instead of a some crazy genius way of making use of router TCAM silicon.
As a further observation, it makes the GitHub attack an interesting case study. You now have to further route on the GET target, and if traffic is encrypted, the routing decision is moved to a later stage.
In order to protect latency to other GET targets, you're going to have to start doing interesting things.
One future solution I can see is multipath-tcp the anomalous traffic, and closing the original connection. But at that point you have to refilter based on genuine vs malicious traffic, and then there's the encrypted state you have to share for the proper stream handover. Ooof... what a nightmare.
Another thing they can do is use anycast to load balance across data centers. So, if a data center rather than a website is a target - the attackers will need to know which IPs to attack. They can start flooding the broadcasted IPs from a particular route. However, if this happens then hypothetically Cloudflare could just stop broadcasting the IPs at this particular data center, re-broadcast them at all the surrounding data centers, and basically spread out the attack load across multiple sites. If the attackers change the IPs that they target based on new routes, then Cloudflare can continue fast-fluxing the IPs every 5 minutes and mitigate the attack.
It's pretty cool use of BGP and anycast, but being able to change IPs of website and where they are broadcasted in real-time is core to Cloudflare's security.