Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That doesn't make sense.

The point of blocking a link with robots.txt is to say "Hey, web crawlers, please don't load and index this page". it does not mean "Hey, users, please don't come and load and read this page".

So the script written, for all intents and purposes, is just the same as a regular old user clicking the link and reading the page then keeping a list of the links that work and those that don't. It's not a crawler, it's an automated user.

If you are a webmaster than wants to block people from posting links to your page all around the web allowing others to come and read it, make the page 403.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: