Hacker News new | ask | show | jobs
by TeMPOraL 764 days ago
No, it's not reductionist and pedantic. It's a reminder that there is no magic. Building an abstraction layer that separates control and data doesn't win you anything if, like the people in the article, you then forget it's a thing and write directly to the level below it.
2 comments

> No, it's not reductionist and pedantic.

It's very reductionistic, because it intentionally ignores meaningful detail, and it's pedantic because it's making a meaningless distinction.

> It's a reminder that there is no magic.

This is irrelevant. Nobody is claiming that there's any magic. I'm pointing out the true fact that details about the abstraction layers matter.

In this case, the abstraction layer was poorly-designed.

Good abstraction layer: length prefix, or JSON encoding.

Bad abstraction layer: "the body of the email is mostly plain text, except when there's a line that only contains a single period".

There are very, very few problems to which the latter is a good solution. It is a bad engineering decision, and it also obfuscates the fact that there even is an abstraction layer unless you carefully read the spec.

-------------

In fact, the underlying problem goes deeper than that - the design of SMTP is intrinsically flawed because it's a text-based ad-hoc protocol that has in-band signaling.

There are very few good reasons to use a text-based data interchange format. One of them is to make the format self-documenting, such that people can easily read and write it without consulting the spec.

If the spec is complex enough that you get these ridiculous footguns, then it shouldn't be text-based in the first place. Instead, it should be binary - then you have to either read the spec or use someone else's implementation.

Failing that, use a standardized structured format like XML or JSON.

But there's no excuse for the brain-dead approach that SMTP took. They didn't even use length prefixing,

I dont disagree with your criticisms of SMTP, but reading those early RFCs (eg 772) is a reminder of what a wildly different place the Internet was back then, and in that light, I feel it only fair to grant some grace.

MTP had one concern which was to get mail over to a host that stood a better chance of delivering it, where the total host pool was maybe a hundred nodes?

I speculate that Postel and Sluizer were aware of alternatives and rejected them in favor of things that were easily implemented on highly diverse, low powered hardware. Not everyone had IBM-grade budgets after all.

Alternative implementations of mail that did follow the kinds of precepts that you suggest existed at one time. X.400 is the obvious example. If I recall correctly, it did have rigorous protocol spec definitions, message length tags for every entity sent on the wire, bounds and limits on each PDU, the whole hog. It was also crushed by SMTP, and this was in the era when you needed to understand sendmail and its notoriously arcane config to do anything. So sometimes the technically worse solution just wins, and we are stuck with it.

> or JSON encoding

JSON needs to escape backslashes, SMTP needs to escape newline followed by period. If you're already accepted doing escaping, what's the issue?

Why not protobufs inside protobufs then?
> Good abstraction layer: length prefix, or JSON encoding.

> Bad abstraction layer: (...)

In this context, it shouldn't matter. Sure, "mostly plaintext except some characters in some special positions..." is considered bad in modern engineering practice, however it's not fundamentally different or more difficult that printf and family. You wouldn't start calling printf without at least skimming the docs for the format string language, would you?

> It is a bad engineering decision, and it also obfuscates the fact that there even is an abstraction layer unless you carefully read the spec.

There's the rub: you should have read the spec. You should always read the spec, at least if you're doing something serious like production-grade software. With a binary or JSON-based protocol, you wouldn't look at few messages and assume you understand the encoding. I suppose we can blame SMTP for design that didn't account for human nature: it looks simple enough to fool people into thinking they don't need to read the manual.

> There are very few good reasons to use a text-based data interchange format.

If you mean text without obvious and well-defined structure, then I completely agree.

> One of them is to make the format self-documenting, such that people can easily read and write it without consulting the spec.

"Self-documenting" is IMHO a fundamentally flawed idea, and expecting people to read and write code/markup without consulting the spec is a fool's errand.

> it should be binary - then you have to either read the spec or use someone else's implementation.

That's mitigating (and promoting) bad engineering practice with protocol design; see above. I'm not a fan of this, nor the more general attitude of making tools "intuitive". I'd rather promote the practice of reading the goddamn manual.

> But there's no excuse for the brain-dead approach that SMTP took. They didn't even use length prefixing,

The protocol predates both JSON and XML by several decades. It was created in times when C was roaming the world; length prefixing got unpopular then, and only recently seems to en vogue.

> No, it's not reductionist and pedantic. It's a reminder that there is no magic.

Exactly! This is an even better phrasing of my point.