Hacker News new | ask | show | jobs
by suanmeiguo 3479 days ago
Oh interesting. I've used diffbot and never thought Diffbot relies on AI. Could you elaborate? I thought it's a simple crawling and parsing task but I might be naive on this.
1 comments

Here's a slightly more detailed description: https://www.quora.com/What-is-the-algorithm-used-by-Diffbot-...

All identification and extraction in our APIs is based on our ML models, which have been fed hundreds of thousands of data-point examples from annotated web pages. Basically: our back end has reviewed millions of web pages to learn what various components of a page are -- and even what "type" of page a page is -- and uses that to make judgments on ones submitted via API.