Hacker News new | ask | show | jobs
by mytailorisrich 1969 days ago
> A recipient SHOULD treat other octets in field content (obs-text) as opaque data.

This is not really a 'should', IMHO, because fields are defined as OCTETS, iirc. Based on that, a compliant and robust implementation must treat them as opaque data.

2 comments

I still fight with case-sensitive matching breaking HTTP2 -> HTTP1.1 proxies
RFC 7230 makes it a point not to make it a MUST as that would make unknown number of existing applications non-compliant with HTTP/1.1-as-redefined. They are free to treat the incoming headers as ISO-8859-1 8bit instead of dropping to 7bit US-ASCII.
RFC 2616 defined header fields as OCTETs, and regarding this change RFC 7230 states:

> Non-US-ASCII content in header fields and the reason phrase has been obsoleted and made opaque (the TEXT rule was removed).

RFC 2616:

field-value = ( field-content | LWS )

field-content = <the OCTETs making up the field-valu and consisting of either TEXT or combinations of token, separators, and quoted-string>

Hence to me fields must be treated as opaque data for backward compatibility and robustness. If anything, existing applications that are compliant with RFC 2616 already do that, right? ;)

RFC 2616 OCTETs are defined as "<any 8-bit sequence of data>" quote unquote, nothing is said about their value beign opaque.

      TEXT           = <any OCTET except CTLs,
                        but including LWS>
IETF rewrote the productions not to use TEXT, but stopped short from banning the old behaviour.

So, for instance, where 2616 states: Reason-Phrase = <TEXT, excluding CR, LF> And 7230 has: reason-phrase = ( HTAB / SP / VCHAR / obs-text )

It is making sure that any application that conforms to 2616 still conforms to 7230 by not making it illegal (MUST) to parse obs-text... Just something you SHOULD not not do. They are simply making it so any new header added is defined as SP / VCHAR only (quoted, possibly).

Let's not argue semantics here. An arbitrary sequence of bytes is an opaque data type, it has no structure, no meaning, no assumption can be made, and it must simply be passed on as is because it can be anything.

That's why they write that it should be treated as opaque data. My point (and the point of the comment I was replying to) is that 'should' is perhaps too weak a word in the context because previous history. In any case for robustness it is a must to treat it that way.