Hacker News new | ask | show | jobs
by nicktelford 1026 days ago
ISO-8859-1 (aka. Latin-1) is a superset of ASCII, so all ASCII strings are also valid Latin-1 strings.

The section you quoted actually suggests that implementations should support ISO-8859-1 to ensure compatibility with systems that use it.

1 comments

You should read it again

> Newly defined header fields SHOULD limit their field values to US-ASCII octets

ASCII octets! That means you SHOULD NOT send Latin1 encoded headers. The opposite of what pzmarzly was saying. I don't disagree Latin-1 being a superset of ASCII or having backward compatibility in mind, but that's not relevant to my response.

SHOULD is a recommendation, not a requirement, and it refers only to newly-defined header fields, not existing ones. The text implies that 8-bit characters in existing fields are to be interpreted as ISO-8859-1.
There is a RFC (2119) that specifies what SHOULD means in RFCs:

> SHOULD This word, or the adjective "RECOMMENDED", mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course.

https://datatracker.ietf.org/doc/html/rfc2119

Haven't you heard of Postel's Maxim?

Web servers need to be able to receive and decode latin1 into utf-8 regardless of what the RFC recommends people send. The fact that it's going to become rarer over time to have the 8th bit set in headers, means you can write a simpler algorithm than what Lemire did that assumes an ASCII average case. https://github.com/jart/cosmopolitan/blob/755ae64e73ef5ef7d1... That goes 23 GB/s on my machine using just SSE2 (rather than AVX512). However it goes much slower if the text is full of european diacritics. Lemire's algorithm is better at decoding those.

>Haven't you heard of Postel's Maxim?

Otherwise known as "Making other people's incompetence and inability to implement a specification your problem." Just because it's a widely quoted maxim doesn't make it good advice.