| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by julie1 3738 days ago

Lol, biggest bug is developer ignoring that latin1 & unicode encoded in UTF8 can coexists in the same stream of data :

- HTTP 1.1 headers are ISO-8859-1 (CERN legacy) while content can be UTF8 - SIP being based on HTTP RFC have the same flaw.

The CTO of my last VoIP company is still wondering why some callerIDs are breaking his nice python program assuming everything is UTF8 and still does not understand this...

Yes, encoding can change, I also saw it while using regionalisation with C# .net in logs.

1 comments

guelo 3738 days ago

According to newer HTTP specs clients should ignore weird ISO-8859 characters. https://tools.ietf.org/html/rfc7230#section-3.2.4:

   Historically, HTTP has allowed field content with text in the
   ISO-8859-1 charset [ISO-8859-1], supporting other charsets only
   through use of [RFC2047] encoding.  In practice, most HTTP header
   field values use only a subset of the US-ASCII charset [USASCII].
   Newly defined header fields SHOULD limit their field values to
   US-ASCII octets.  A recipient SHOULD treat other octets in field
   content (obs-text) as opaque data.

Though I guess you'd still need to decode it correctly in order to ignore the right characters.

link

AnthonyMouse 3738 days ago

IETF should just publish an RFC that says "all text without a field specifying its encoding shall be UTF-8, even if this conflicts with a previous RFC." The only real objection to doing this is that it would break things, but almost all of those things are already broken.

link

colejohnson66 3738 days ago

The problem is browser vendors wouldn't want to implement that spec because if updating your browser breaks the website, no matter how much you explain it to the user, it's your fault, not the website owner's. It's why we have Quirks Mode even after 15 years. It's why Linus is so adamant about patches breaking userspace;[1] if your update broke it, it's your fault, no matter how bad the truly broken thing is.

[1]: https://lkml.org/lkml/2012/12/23/75

link

AnthonyMouse 3735 days ago

There are cases where the status quo is already broken and you're already being blamed for it. A change that makes the brokenness 20% instead of 80% by inverting the set of weird websites it happens on is going to make userspace less broken on net.

link

julie1 3737 days ago

Well there are still situations where coders put mix different codeset / encoding. Willingly or not. And SIP still exists.

link