Hacker News new | ask | show | jobs
by miki123211 2818 days ago
I think we are confusing two things here.

The OP probably meant this tool as something useful in his situation and shared it with others, in case someone else needs something like that. He was probably thinking along the lines of "If someone needs that, it's fine, let them discover something like that exists and use it". People over here are viewing this as a one-size-fits-all solution for web scraping and pointing out valid reasons why it's a bad idea to use it like that. I think that we should accept that this tool might be good for some, but completely unnecessary for others and we shouldn't criticize it for not being useful for our purposes.

One situation where this tool has clear advantages over other solutions is client-side scraping. If we made an app for ios/android/Windows/whatever that runs on devices owned by end-users and crawls data upon request, perhaps from multiple websites, having the crawlers written as external scripts would be extremely useful. That would allow you tu push updates to the crawlers separately, immediately after a website changes its layout, without the need to update your app. Making a gallery of downloadable crawlers for more sources would, probably, aslo be possible. The limitations of that language are very advantageous in this situation, as crawlers are mostly sandboxed and can't destroy your filesystem, steal your data etc. This tool also allows keeping the crawlers separate from the app. That would allow people to create a global npm-like repository for crawlers working regardless of what programming language you use (provided someone wrote an implementation of this tool in your language). Imagine an use-case like building a books price comparator app in, let's say, Java for android and swift for ios, and maybe even c++ because some libraries still run Windows xp and would like that app to be available, and being able to download the crawlers for Amazon and tens of local bookselling websites that would work in all of those apps, without a need to write them yourself. If used right, this tool could actually allow programmers to imagine that there's actually a semantic web as originally imagined and write services that interact with various websites in surprising ways, without thinking about how the interactions are done on a lower level.

1 comments

Thank you very much for your valuable feedback and I'm glad that someone has finally got the idea :)