|
|
|
|
|
by geuis
3710 days ago
|
|
I built something almost identical in 2011. It really doesn't have as much utility in practice as you think initially. CSS selectors are an interesting idea for extracting data from pages, but it's extremely fragile. You have to either parse the page's raw html using something like jsdom, or you run it through a headless browser like Phantom. In the first case, it completely fails for any modern SPA (angular, react, etc). In the second case, phantom is painfully slow and difficult to interact with, and often doesn't run/render an SPA as a regular browser does. You can write tests around whether your selectors are returning data, but even simple refactors from a dev team quickly break your selector profiles multiple times a week or month. Just wasn't worth the hassle. |
|
The trick is to emulate x11 with xvfb and control it with selenium web driver.
Phantom isn't the only choice, just the one most people talk about
As for non js heavy website, it's fairly trivial to find a library that will parse the dom for you, pretty every language have one