Hacker News new | ask | show | jobs
by thworp 583 days ago
As the other comment mentioned, the email body contains the entire quote chain. The way clients accomplish threaded display is a combination of:

- parsing the unstructured email body and looking for quote levels, html formatting and printed email heads

- parsing certain headers like message-id, in-reply-to, dkim sig

- looking for sections of the message body in the inbox

This is done because there is nothing in the protocol to cleanly accomplish what you want. Even if there was, you could not rely on it at all. Doing anything with email is a gigantic PITA, you sometimes get emails where the msg-encoding header doesnt match the body's encoding, html in the plaintext section and other fun things.

Since nobody really cares about the RFC and just does their own thing, there is no chance at improvement.

1 comments

This is true. OTOH, I do think the problem is solvable.

I came up with a routine to parse and translate about 2-3GB of saved emails into MBox format once.

The official delimiter is unbelievable, IMHO.

« the exact character sequence of "From", followed by a single Space character (0x20), an email address of some kind, another Space character, a timestamp sequence of some kind, and an end-of-line marker. »

https://datatracker.ietf.org/doc/html/rfc4155

https://en.wikipedia.org/wiki/Mbox

That's it. An email is a section of text beginning with

From $something

That's the spec.

Certain software used to add a > before any line starting with From in an email body because of this.