Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hmmm, not sure if I agree with that or not... but if so, it has a very limited usage quota before you get banned. :)


What API do you want? Automated submissions of posts is not a good idea, so that leaves data scraping.

If you're looking to extract data from HN, use HNSearch. http://www.hnsearch.com/

There's no reason to scrape data from HN itself.


HNSearch is great and I use it for my Wayback Letter project, but it hasn't been around that long so I couldn't use it before. When I started Hacker Newsletter about 2.5 years ago there was nothing, so scraping was the only solution. One of these days I will convert the application I built to build out each issue, but there also is a risk of HNSearch going away just like the last search engine did.


I'm curious why you disagree, aside from the quota thing? I have written many API consumers over the years and there is basically no difference between parsing HTML or any other format; XML especially, for obvious reasons.

HTML APIs to come with the added benefit of being less prone to change, which I realize goes against conventional wisdom, but seems to hold in practice.


I agree that parsing can be easily done, although I don't think it is necessarily equal... but I guess that would depend on a case by case basis since a lot of API's are terrible. What makes it not an API IMHO is that you can't consume it when needed, but rather you have to consume it all the time. I guess you will call that a streaming API though. :)

HTML APIs to come with the added benefit of being less prone to change

Really? I don't think that does hold in a lot of cases. Using HN as an example, it broke a year or so ago when PG changed how the job postings were listed. Again, the quality of an API can vary, but at least you would know what changed in that case.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: