Hacker News new | ask | show | jobs
by d--b 2472 days ago
Sorry to be a pain here, but I very much doubt your ML thing is working at all. Opening a website and finding a dom element is trivial, so the only thing I'd get when buying from you is the promise that this will be resilient to website updates.

But at the same time, for $500/month, you can definitely have people updating the selectors manually...

3 comments

I've been using Dashblock in a production environment and it's super easy to create and use the APIs on sites. We'd previously written our own scripts to do this at scale, but it was difficult to keep them all up to date. You're right that fixing a single page's dom changes is trivial, but it's a real pain to scale that. Regarding the ML aspect, I've tested changing the dom for a page in Dashblock and it seems to work... it didn't break the scraping I had set up. The price might not make sense for everybody, but for me it's definitely worth it.
Save your money. Create robust regression scripts and hire a freelancer to fix it when the schema changes. anyone serious crawling/scraping data from websites won’t leave it up to some automated ml magic to extract the right data for them.
The Freelancer approach does not work, if the data you extract is time critical. But then again, why would anyone not try to find an api for such data and rely on dom parsing. So, OPs product is worth it, if you validated and trust their ML model to work correctly, believe they can guarantee a certain uptime and the data you extract is time sensitive or mission critical. Also the data cannot be resourced from an api. People with such needs may be willing to pay good bucks, but good luck finding early adopters as well as the data-niche where there is demand for something like this.
Have you tried it?