|
|
|
|
|
by dabeeeenster
4463 days ago
|
|
The problem with these sorts of solutions is that they work perfectly for 'simple' sites like the register, but fail hard with 'modern' sites like, e.g. ASOS.com. Just tried ASOS and the web front end failed to request a product page correctly... All the dynamic JS and whatnot just plays havoc with these projects. In my experience you have to run through webdriver or something like phantomjs and parse the JS... |
|
In the meantime, I've written Tampermonkey scripts that will scrape and embedd multiple pages all hack-like, but at least I get a good CSV of the data I need.
To me, the value in this tool is the user interface for creating the scrape logic. If this ran as an embeddable JS app, that you could place inside any page and utilize local storage, you could scrape these dynamic sites by viewing the page first, and still get all of the cool gadetry provided by this tool.
In essence, the value of this tool could be built as a bookmarklet. THAT SIR - I would use every, single, day.