Depends on the intent. If it is user-initiated (like say a mobile formatted version of the site), it wouldn't have to be obey the robots.txt, since it is not a crawler, just another web browser.
No, not tried to download it yet.
Regarding your question, if you try to use a start > 999 you get this error: "Validation error: max limit is 100, max start+limit is 1000", which is why I avoided that parameter.
I'm not sure what you're trying to do though. I used beautifulsoup because I couldn't get lxml working on BB10, but if it was switched to using lxml it would be much faster.
Depending on what you're trying to do with the data, you may find http://diffbot.com/products/automatic/ helpful for getting the clean article text and categorization in JSON format. It can be used as a complement/augmentation to the great suggestions here for getting the links.
i saw your github repo . Wonderful work but saw your api was not working getting some errors when i tried the link http://api.ihackernews.com/by/kaushikfrnd. Can you confirm it will work if i run it on my own server .
Ah, looks like there's an issue with the iHackerNews API itself, which I don't have a hand in. You'll want to hit up @ronnieroller on Twitter. Sorry I can't be of more help. :/
I bet it's just a cost/benefit analysis. An API is a way to get more eyeballs by motivating 3rd party developers to integrate and publicise your service. HN does not need that: it has enough traffic as it is, and given the target audience, you would see an instant proliferation of half-assed apps hammering its endpoints. So it would be an additional cost for no real benefit.
The current situation (PG and friends optimise a basic but very accessible website, and a handful of third parties build APIs on top) is much more manageable.
I don't believe that HN restricts or discourages the scraping of HN content in any way... Other than the restrictions here: https://news.ycombinator.com/robots.txt
If you have a fabulous idea for how to use the data contained on this site, I'm sure everyone will be impressed and interested to see it.
haha good to see someone link it! I am the author of Scrape.it currently on mashape. I also wrote http://scrape.ly for crawling web pages and extracting data.