Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

FWIW, the distinct proxy business can be solved in one of two ways, depending on how complex your proxy needs are.

First, you can just put ``request.meta["proxy"] = "http://..."`` at any point along the DownloadMiddleware before the request is actually transmitted.

Second, you could also just package up that "more control over requests" you described and make it a DownloadMiddleware. AFAIK, the first one in the chain that returns a populated Response object wins, so you could short-circuit all of the built-in downloading mechanism.



Thanks, good to know regarding the proxy. There was a couple of other little things that just didn't work the way I wanted though (I honestly don't remember them now).

I've built private libraries on top of requests now that allow me to do everything in such a trivial amount of time, so I prefer this approach with more control.

I think if I was going to write a long running spider, I'd probably look into scrapy again beforehand.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: