| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by marban 753 days ago
	Nice list, but what would be the arguments for switching over from other libraries? I’ve built my own crawler over time, but from what I see, there’s nothing truly unique.

1 comments

jancurn 753 days ago

The main advantage (for now) is that the library has a single interface for both HTTP and headless browsers, and bundled auto scaling. You can write your crawlers using the same base abstraction, and the framework takes care of this heavy lifting. Developers of scrapers shouldn't need to reinvent the wheel, and just focus on building the "business" logic of their scrapers. Having said that, if you wrote your own crawling library, the motivation to use Crawlee might be lower, and that's fair enough.

Please note that this is the first release, and we'll keep adding many more features as we go, including anti-blocking, adaptive crawling, etc. To see where this might go, check https://github.com/apify/crawlee

link

robertlagrant 753 days ago

Can I ask - what is anti-blocking?

link

fullspectrumdev 753 days ago

Usually refers to “evading bot detection”.

Detecting when blocked and switching proxy/“browser fingerprint”.

link

robertlagrant 753 days ago

Is this a good feature to include? Shouldn't we respect the host's settings on this?

link

nlh 753 days ago

It’s a fair and totally reasonable question but clashes with reality. Many hosts have data that others want/like to scrape (eBay, Amazon, Google, airlines, etc.) and they setup anti-scraping mechanisms to try and prevent scraping. Whether or not to respect those desires is a bigger question but not one for the scraping library - it’s one for those doing the scraping and their lawyers.

The fact is - many many people want to scrape these sites and there is massive demand for tools to help them do that, so if APIFY/Crawlee decide to take the moral ground and not offer a way around bot detection, someone else will.

link

thebytefairy 753 days ago

Ah yes, the old 'if I don't build the bombs for them, someone else will'. I don't think this is taking the moral high ground, this is saying we don't care whether it's moral, there's demand and we'll build it.

link