Hacker News new | ask | show | jobs
by phyzome 1325 days ago
I can give a more detailed response later, but...

« Escaping is a function of the consumer, not the producer »

This is incorrect. The producer emits something in a language, be it HTML or JSON or HTTP headers or whatever. Data must be encoded properly for that language. The consumer must then decode, of course, so in a sense it is the job of both. But the onus is really on the producer.

3 comments

> This is incorrect. The producer emits something in a language, be it HTML or JSON or HTTP headers or whatever. Data must be encoded properly for that language.

Which is the consumption side. When you send data to an HTML template engine, it’s escaped as input, meaning with the template engine as consumer, not with the template engine as producer.

It may be a “pipeline” situation where the consumer also produces something (e.g. JSON or HTML), but it doesn’t have to be e.g. an SQL interface might have no production, but the data it consumes still needs to be properly escaped.

When your producer produces data, it has no idea how that data will be used, and that’s what determines the necessary transformations e.g. it’s of no help to you if your templating engine generates content escaped for MSSQL when you’re not going to put it in MSSQL.

> it’s of no help to you if your templating engine generates content escaped for MSSQL when you’re not going to put it in MSSQL.

Allow me to complain a bit about MSSQL.

When you're escaping a LIKE expression for MSSQL, you must also escape the "[" character, since it's a wildcard for MSSQL (and nowhere else except AFAIK Sybase). When you're escaping a LIKE expression for other databases, you must not escape the "[" character, since some databases reject escaping anything other than the % and _ wildcards. That is, your escaping code for a LIKE expression has to be database-specific, because MSSQL (and AFAIK Sybase, it seems both have a common ancestor) decided to be different.

> When you're escaping a LIKE expression for other databases, you must not escape the "[" character, since some databases reject escaping anything other than the % and _ wildcards. That is, your escaping code for a LIKE expression has to be database-specific, because MSSQL (and AFAIK Sybase, it seems both have a common ancestor) decided to be different.

TBF you may need custom codepaths because defaults diverge as well, IIRC postgres and sqlite default to ESCAPE '\' while mssql and oracle default to ESCAPE '' (the latter being the actual spec behaviour).

So in Postgres and SQLite you must always escape your LIKE parameter, while in mssql and oracle that's not the case.

> TBF you may need custom codepaths because defaults diverge as well, IIRC postgres and sqlite default to ESCAPE '\' while mssql and oracle default to ESCAPE '' (the latter being the actual spec behaviour).

The trick is to just avoid the default, and always use an explicit ESCAPE, which should work the same on every database (except mysql without NO_BACKSLASH_ESCAPES in which you also have to escape the backslash itself, otherwise it will escape the closing quote and get very confused, but that issue can be avoided by using a character other than backslash as the escape character).

The whole point is that the producer may be hostile, or buggy, and the consumer must handle that. Asserting that it “must” be encoded properly does not make it so.
Too bad if your consumer has to interact with anything that could be malicious in any circumstances!

Consumers must properly escape any input.

That doesn't make sense to me and I agree with GP. If I consume HTML and I escape all HTML input I'm given, I'm utterly useless.

Now when I consume text and convert that text into HTML for further treatment, I'm producing HTML, and I must properly escape my input in that conversion. The escaping is only needed because I produce HTML. In fact the only time escaping can be done is when producing data, because if unescaped data is ever produced, the cat's out of the bag.

Edit: Actually think that producer/consumer is a wrong way to talk about this. Escaping only ever occurs at a boundary when transforming between formats (eg from "text string" to "html string") which is always both producer (of the new format) and consumer (of the old format). But it can always be thought of as a type cast, with possible type confusions when input and output formats share the same machine representation (eg string).

> That doesn't make sense to me and I agree with GP. If I consume HTML and I escape all HTML input I'm given, I'm utterly useless. [...] Now when I consume text and convert that text into HTML for further treatment, I'm producing HTML, and I must properly escape my input in that conversion.

Which is my point, it's the consumption side which defines what the escaping should be.

> Escaping only ever occurs at a boundary when transforming between formats (eg from "text string" to "html string") which is always both producer (of the new format) and consumer (of the old format).

A database interface is not a transformer / producer, needs escaping. Globbing is not a transformer either. Still needs escaping.

I disagree, a database interface is a format boundary at which a transformation occurs (from text to SQL) and so is globbing (from text to pattern).
Whatever, call it a transformer if that makes you hard.

Point doesn't change: what escaping is needed is a function of the "transformer" and applied to the input (= consumption) side of it.

You don't apply an escaping because data comes from a database, you apply it because it goes into one. Same with template processors, regex engines, etc...

Sure, redefine the terms if you like.

The thing that accepts the input must make sure it is properly escaped. Think of SQL injection attacks - they are because the thing that accepts input hasn't properly escaped the input.

Cross site scripting attacks are exactly the same thing but occur when the input side doesn't properly escape HTML input.

I suspect you are having a vigorous debate about the ill defined “producer consumer” terminology and probably agree.