|
|
|
|
|
by thworp
583 days ago
|
|
As the other comment mentioned, the email body contains the entire quote chain. The way clients accomplish threaded display is a combination of: - parsing the unstructured email body and looking for quote levels, html formatting and printed email heads - parsing certain headers like message-id, in-reply-to, dkim sig - looking for sections of the message body in the inbox This is done because there is nothing in the protocol to cleanly accomplish what you want. Even if there was, you could not rely on it at all. Doing anything with email is a gigantic PITA, you sometimes get emails where the msg-encoding header doesnt match the body's encoding, html in the plaintext section and other fun things. Since nobody really cares about the RFC and just does their own thing, there is no chance at improvement. |
|
I came up with a routine to parse and translate about 2-3GB of saved emails into MBox format once.
The official delimiter is unbelievable, IMHO.
« the exact character sequence of "From", followed by a single Space character (0x20), an email address of some kind, another Space character, a timestamp sequence of some kind, and an end-of-line marker. »
https://datatracker.ietf.org/doc/html/rfc4155
https://en.wikipedia.org/wiki/Mbox
That's it. An email is a section of text beginning with
From $something
That's the spec.