Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's a pity that robots.txt doesn't let you specify what the crawler can do with the resources it's allowed to fetch. I think that if we had such a feature (or something similar, like a "License" header) standardized early enough , a few issues regarding crawling and search engines would be moot, or at least easier to solve automatically.


True but all the commercial websites would use it to ban scraping then.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: