|
|
|
|
|
by ianbicking
2818 days ago
|
|
I think this could go further in terms of making it declarative. A simple declarative approach could taking this: LET google = DOCUMENT("https://www.google.com/", true)
and instead of thinking about it as an action (get this page), think about it as giving you an object. The result is a tuple of the URL, the time fetched, and maybe other information (like User-Agent). This helps with exploratory scraping, where you want to be able to repeat actions without always re-fetching the documents. And you'll be constructing a program, unlike a REPL where you always write the program top-to-bottom, including all your intermediate bugs.Changing DOCUMENT() is easy enough. Things like CLICK() are a bit harder, though if you extend the data structures you can have a document that is the result of clicking a certain element in a certain previous document. Again to do it the first time you have to actually DO the action, but later on perhaps not. And you'll be constructing interstitial objects that are great for debugging. Then what could make it feel really declarative is having more than one presentation of an execution. You can package up a scraping, and then you can answer questions about WHY you ended up with certain results. |
|
https://github.com/MontFerret/ferret/blob/master/docs/exampl...
Document, returned form DOCUMENT() function, represents an open browser tab which allows you to do all interactions with the page.