If the service is exposed publicly to the web, It can be crawled regardless of whatever guards are in place by the service provider. Browser emulation will be a good start.
I know we can crawl them. There are no technical issues with crawling them. The issue is I want to respect their TOS. There are multiple ways to circumvent their anti-crawling code but that means any new search engine will have its roots in "shady" tactics of crawling.
IMHO crawling should be allowed if the purpose of the crawler is to show results that drive traffic back to the sites and not mashup the content and deliver it as "original content" on the site that crawled these domains. However e.g. Yelp's TOS do not allow for these types of crawlers that essentially drive traffic back to Yelp.
Agreed but no harm in accessing the vast knowledge of the HN community to exhaust alternatives before tightening my hacker cap and plunging in head first.