Hacker News new | ask | show | jobs
by ziflex 2818 days ago
That's true. The difference is how much efforts is needed to do that using API.

What it brings is just a higher abstraction of that API which lets you easily to get work done.

2 comments

Do you have a more involved example where Ferret really shines, as opposed to a library with a similar API in JS or another common language? I really don't mean to be negative, but I just don't see how Ferret is any easier to use than something like Nightmare[0]. That said, I'm wondering if it's an issue of communication more than anything, so maybe a different example than the one in the readme would help.

[0]: https://github.com/segmentio/nightmare

You are fine, I totally understand your scepticism. And you are right, there are definitely issues in communication.

First of all, I built it for myself. I needed a high level representation of scraping logic, which would run an isolated and safe environment. Second, I needed to be able easily scrape dynamic pages.

So, what I got is: - high level, declarative-is language, that hides all infrastructural details, which helps you to focus on the logic itself. that helps you to describe what you want without worring about underlying technology. Today, I'm using headless Chrome, tomorrow I will use something else, but the change should not affect your code. - full support of dynamic pages. You can get data from dynamically rendered page, emulate user's actions and etc. Heck, you can even write bots with it. - embeddable. now, I have only CLI, there are plans to write a web server where you can save your scripts, schedule them and set up output streams.

But the main idea is to provide high level declarative way of scraping the web. I'm not saying you can't do that with other tools. I'm just trying to come up with something more easy to work with.

Regarding examples, the project is still WIP, so as more complex features I get, more complex examples I get. Here is more or less complex, getting data from Google Search. It's not that difficult, but it showcases the core feature of work with dynamic pages.

https://github.com/MontFerret/ferret/blob/master/docs/exampl...

"Much more effort"? Right now it implements a library and a language on top of that. Making it just be a library would cut the work in half.
The idea is to create a high level abstraction that represents your web scraping logic. The project is still WIP. I will create a web server which will help you to store your queries, schedule them and set up output stream to other systems like Spark and Flink.