|
|
|
|
|
by logn
3270 days ago
|
|
> supply an url and get back plain text from the page I would like to have someone (a Java or Kotlin developer) take over https://github.com/MachinePublishers/ScreenSlicer and rework it. It's a project that intends to do exactly this. You enter a url, it finds a search box, enters the user's search query, uses single-layer neural nets and text munging to extract the search results, and then separates each result into a few fields (url, date, title, summary). But it was written as a complete app and so is of limited use to most people. If it were remade to be a library it would have much more utility. Beware there are lots of ugly regexes and terrible hacks. This is html afterall, the Cthulhu way. Recently re-licensed from AGPL to Apache 2.0. Also my time is busy on other things. I would be able to answer some questions occasionally but largely I can't provide much help. |
|