Hacker News new | ask | show | jobs
by krapp 4292 days ago
Umm... anything that uses xpaths should work I would think.

Apologies for blowing my own horn but I've had some luck filtering HN and reddit with this project I built (I used to have an example in progress online but i've taken it down): https://github.com/kennethrapp/embedbug

1 comments

The point is I want some heuristic that would work "automagically" (like Readability, etc), not requiring me to invent a tailor-made xpath for each and every such website in the world.
Try this:

http://fivefilters.org/content-only/

It has a default extractor, and site-specific recipes use the same format as Instapaper, so you can leverage the work Marco has done on different sites.

Oh, alright.

If there is such a thing I'd be interested to learn about it myself. TBH "tailor make an xpath for every site" is the best solution i'm aware of.