Hacker News new | ask | show | jobs
by jarek83 1224 days ago
Basic question before I can recommend this to my boss: can it scrape G2? (or any other page behind CF)
3 comments

I could depending on the proxy I got from my provider. I'm currently working on adding the ability to select a higher quality proxy for difficult to scrape websites and to add captcha solvers as well.

But if I have to be honest, I can not guarantee it at the present time.

This app is the side-project I starter 2 months ago, it's evolving fast but I still need to add some key features for enterprise customers.

scrapeninja.net /scrape-js endpoint scrapes company pages of g2 without big troubles (with "us"/"eu" proxy geo in their online sandbox: https://scrapeninja.net/scraper-sandbox ). They also have /scrape which is much faster because it does not bootstrap real browser, and bypasses CloudFlare TLS fingeprint check: https://pixeljets.com/blog/bypass-cloudflare/
What are G2 and CF?
G2 is a software (as a service) comparison website: https://www.g2.com/

CF is Cloudflare, which offers an anti-scraping protection for websites (among other things): https://www.cloudflare.com/

Thank you!