|
|
|
|
|
by mnmkng
706 days ago
|
|
In one word. Nothing. But I personally think it does some things a little easier, a little faster and little more conveniently than the other libraries and tools out there. Although there’s one thing that the JS version of Crawlee has which unfortunately isn’t in Python yet, but it will be there soon. AFAIK it’s unique among all libraries. It’s automatically detecting whether a headless browser is needed or if HTTP will suffice and using the most performant option. |
|
I find some dynamic sites purposefully make it extremely difficult to parse and they obfuscate the XHR calls to their API
I've also seen some websites pollute the data when it detects scraping which results in garbage data but you don't know until its verified