Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Just in case a robot.txt kills that

http://pastebin.com/rcPSyRnR



It's also on seclists.org -

http://seclists.org/isn/2015/Aug/4


Would archive.org typically honor a robots.txt for a resource it already retrieved? I never understood the intent of a robots.txt to be retroactive.


Apparently yes, it would: https://archive.org/about/exclude.php


My understanding is that sites like archive.org honor robots.txt retroactively not because they are required to, but to best honor the wishes of the content provider.


Yes, it simply hides the content, it is still kept in their database so if the robots.txt disappears, it pops back from their archive.

New pages won't be archived though.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: