|
I was expecting an ML-driven framework where you write the HTML you want to scrape, and the framework diffs the trees and attempts to extract the information from the target tree as best it can to match your input tree. That's what pops into mind when I think of "declarative" scraping. LET google = DOCUMENT("https://www.google.com/", true)
INPUT(google, 'input[name="q"]', "ferret")
CLICK(google, 'input[name="btnK"]')
WAIT_NAVIGATION(google)
LET result = (
FOR result IN ELEMENTS(google, '.g')
RETURN {
title: ELEMENT(result, 'h3 > a'),
description: ELEMENT(result, '.st'),
url: ELEMENT(result, 'cite')
}
)
RETURN (
FOR page IN result
FILTER page.title != NONE
RETURN page
)
Looks an awful lot like: const { document, input, elements, waitNavigation } = require("your-library")
const scrape = () => {
let google = document("...", true)
input(google, "...", "...")
click(google, "...")
waitNavigation(google)
return elements(google, ".g")
.map(r => {...})
.filter(p => {..})
}
scrape();
Am I missing something here? I don't see anything declarative about the the first one over the second; both of these look identical and rather imperative to me. Is "declarative" becoming a buzzword (thanks to React, maybe?), or am I missing something? |