| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by logn 3270 days ago

> supply an url and get back plain text from the page

I would like to have someone (a Java or Kotlin developer) take over https://github.com/MachinePublishers/ScreenSlicer and rework it. It's a project that intends to do exactly this. You enter a url, it finds a search box, enters the user's search query, uses single-layer neural nets and text munging to extract the search results, and then separates each result into a few fields (url, date, title, summary).

But it was written as a complete app and so is of limited use to most people. If it were remade to be a library it would have much more utility.

Beware there are lots of ugly regexes and terrible hacks. This is html afterall, the Cthulhu way.

Recently re-licensed from AGPL to Apache 2.0.

Also my time is busy on other things. I would be able to answer some questions occasionally but largely I can't provide much help.