Just in case a robot.txt kills that http://pastebin.com/rcPSyRnR

snsr · on Aug 11, 2015

It's also on seclists.org -

http://seclists.org/isn/2015/Aug/4

hughw · on Aug 11, 2015

Would archive.org typically honor a robots.txt for a resource it already retrieved? I never understood the intent of a robots.txt to be retroactive.

mikeash · on Aug 11, 2015

Apparently yes, it would: https://archive.org/about/exclude.php

syncsynchalt · on Aug 11, 2015

My understanding is that sites like archive.org honor robots.txt retroactively not because they are required to, but to best honor the wishes of the content provider.

X-Istence · on Aug 11, 2015

Yes, it simply hides the content, it is still kept in their database so if the robots.txt disappears, it pops back from their archive.

New pages won't be archived though.