Hacker News new | ask | show | jobs
by choilive 555 days ago
Perhaps LLMs can solve this somewhat? Not for email summarization - but to intelligently strip away all the HTML fluff and return a plain text version of the contents.
3 comments

It is a solved problem. Here is a solution that requires something of the order of 1,000,000th of the resources of your proposed idea, no subscription, and runs so fast that you would not even notice it on a machine from 20 years ago:

    > grep text/html ~/.mailcap
    text/html; lynx -width 72 -assume_charset=%{charset} -display_charset=utf-8 -dump %s | sed 's|^   ||'; nametemplate=%s.html; copiousoutput
If you want something more modern:

    text/html; webdump -dli < %s | sed 's/^  //g'; needsterminal; copiousoutput
Whats webdump?
FWIW, it's pretty straightforward to extract text from an HTML snippet without LLMs, I'm not actually sure if there's anything they'd do better than a simple HTML parser.
Apple Intelligence already does this in the line summary.