|
|
|
|
|
by karlicoss
1509 days ago
|
|
Oh nice, I like it! So it basically automates detecting useful bits for a particular URL, but it's kind of time consuming and flaky. It could be very helpful to populate the 'rules' database though, and then this database could be shared with other people so they don't have to scrape. I guess when I said ML (or preferably some fuzzy algorithm/heuristic), I was referring to generifying rules so they also work on the sites not in the rules database. If humans can detect garbage in the URL looking at a few examples, the computer can too :) |
|