| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by phyzome 1325 days ago
	It's not, though. It's the easiest thing in the world: Just use a library that never emits unescaped content by default, or if you make a single-character typo. The problem is that most of the libraries aren't that.

3 comments

Joker_vD 1325 days ago

Different contexts require different escaping schemes, you know?

link

mrkeen 1325 days ago

Escaping isn't always a yes/no question.

If someone enters "foo bar" into your frontend, should the backend only see "foo%20bar" ?

link

pavlov 1325 days ago

The back-end should see a 7-byte buffer with values [102 111 111 032 098 097 114], assume it's UTF-8 and convert that to its internal string representation?

link

ezfe 1325 days ago

no, the backend has no reason to see `foo%20bar` - you escape when you're combining that string with other strings (ie into HTML, into a SQL query, etc.)

link

evilDagmar 1325 days ago

There's just a whole library of CVEs for people who attempt to escape things being sent to SQL. Use parameterized queries already.

link

8n4vidtmkvmk 1324 days ago

every one says "just use parameterized queries" but they don't handle arrays which makes the idea rather useless.

link

jiggawatts 1324 days ago

Many database engines can handle arrays, or table-valued variables which are basically the same thing. Most ORMs will also abstract away arrays for you, so you as the developer never need to deal with escaping of data in arrays.

link

8n4vidtmkvmk 1322 days ago

Which relational DB supports this natively?

ORMs don't count. They're just editing the SQL.

link

orf 1324 days ago

Use one that does? Or build it yourself?

    ARRAY[?, ?, ?, ?]

link

8n4vidtmkvmk 1322 days ago

I do, but I feel like it defeats the purpose. In order to insert those ?s you have to parse the query, which is exactly what we're trying to avoid.

link

masklinn 1325 days ago

> It's the easiest thing in the world: Just use a library that never emits unescaped content by default

That doesn't make any sense? Escaping is a function of the consumer, not the producer. Hell, most of the problematic content doesn't come from a library to start with.

And if your Markdown -> HTML converter produces escaped content... it's not a Markdown -> HTML converter, because the result is not HTML.

More broadly, I think one of the core issues is this:

> Escape user input

User input is a broad and complicated category, and it's easy for user input to be "laundered" as it moves through an application.

And then escaping is an explicit action, which means it can be missed or forgotten, which is also a problem.

This means the solution is really that APIs should default to escaping most everything. Rather than having to mark "untrusted" content, it's trusted content which should be marked thus. "Escaping" is the wrong default.

But of course that doesn't solve all the issues. Like markdown, where you want the output of the Markdown converter to be trusted (otherwise the output won't be properly formatted on display), what you don't want trusted is the input, and that means you don't want the input to be laundered through the Markdown converter.

Which is an issue in most Markdown libraries, as they inherit the "trusted input" model from Gruber's original Markdown, where HTML passthrough was a feature.

In that sense one design I did enjoy is Jinja and Markupsafe in the Python ecosystem:

- Like most modern template libraries, Jinja escapes content by default.

- Also (though somewhat sadly) like most template libraries Jinja allows marking a value as safe at point-of-use, however that's dangerous as content can be mixed and it's easy for safe content to suddenly be swapped out for user input and become unsafe through seemingly unrelated changes.

- So a better method is to use `markupsafe.Markup` at the source, it's a string subclass which the library considers safe (because Jinja uses `markupsafe.escape` internally), the neat thing is any combination between a Markup instance and a non-Markup string will implicitly escape the non-Markup parameter(s).

This means you can mark safe content as safe at the source (where it's easy to prove it's safe because e.g. it's a literal), then most transformations will maintain the safety invariants. Though obviously it only works with content you know will ultimately be markup-injected.

And non-method APIs can't be overridden (e.g. re, or HTML/XML libraries) so they're not Markup-aware, they'll treat Markup objects as regular strings which that complicates processing pipelines if you want to conserve safety invariants. At the same time, those are laundering opportunities so care is useful.

link

phyzome 1325 days ago

I can give a more detailed response later, but...

« Escaping is a function of the consumer, not the producer »

This is incorrect. The producer emits something in a language, be it HTML or JSON or HTTP headers or whatever. Data must be encoded properly for that language. The consumer must then decode, of course, so in a sense it is the job of both. But the onus is really on the producer.

link

masklinn 1325 days ago

> This is incorrect. The producer emits something in a language, be it HTML or JSON or HTTP headers or whatever. Data must be encoded properly for that language.

Which is the consumption side. When you send data to an HTML template engine, it’s escaped as input, meaning with the template engine as consumer, not with the template engine as producer.

It may be a “pipeline” situation where the consumer also produces something (e.g. JSON or HTML), but it doesn’t have to be e.g. an SQL interface might have no production, but the data it consumes still needs to be properly escaped.

When your producer produces data, it has no idea how that data will be used, and that’s what determines the necessary transformations e.g. it’s of no help to you if your templating engine generates content escaped for MSSQL when you’re not going to put it in MSSQL.

link

cesarb 1325 days ago

> it’s of no help to you if your templating engine generates content escaped for MSSQL when you’re not going to put it in MSSQL.

Allow me to complain a bit about MSSQL.

When you're escaping a LIKE expression for MSSQL, you must also escape the "[" character, since it's a wildcard for MSSQL (and nowhere else except AFAIK Sybase). When you're escaping a LIKE expression for other databases, you must not escape the "[" character, since some databases reject escaping anything other than the % and _ wildcards. That is, your escaping code for a LIKE expression has to be database-specific, because MSSQL (and AFAIK Sybase, it seems both have a common ancestor) decided to be different.

link

masklinn 1325 days ago

> When you're escaping a LIKE expression for other databases, you must not escape the "[" character, since some databases reject escaping anything other than the % and _ wildcards. That is, your escaping code for a LIKE expression has to be database-specific, because MSSQL (and AFAIK Sybase, it seems both have a common ancestor) decided to be different.

TBF you may need custom codepaths because defaults diverge as well, IIRC postgres and sqlite default to ESCAPE '\' while mssql and oracle default to ESCAPE '' (the latter being the actual spec behaviour).

So in Postgres and SQLite you must always escape your LIKE parameter, while in mssql and oracle that's not the case.

link

cesarb 1325 days ago

> TBF you may need custom codepaths because defaults diverge as well, IIRC postgres and sqlite default to ESCAPE '\' while mssql and oracle default to ESCAPE '' (the latter being the actual spec behaviour).

The trick is to just avoid the default, and always use an explicit ESCAPE, which should work the same on every database (except mysql without NO_BACKSLASH_ESCAPES in which you also have to escape the backslash itself, otherwise it will escape the closing quote and get very confused, but that issue can be avoided by using a character other than backslash as the escape character).

link

buttocks 1325 days ago

The whole point is that the producer may be hostile, or buggy, and the consumer must handle that. Asserting that it “must” be encoded properly does not make it so.

link

nl 1325 days ago

Too bad if your consumer has to interact with anything that could be malicious in any circumstances!

Consumers must properly escape any input.

link

deredede 1325 days ago

That doesn't make sense to me and I agree with GP. If I consume HTML and I escape all HTML input I'm given, I'm utterly useless.

Now when I consume text and convert that text into HTML for further treatment, I'm producing HTML, and I must properly escape my input in that conversion. The escaping is only needed because I produce HTML. In fact the only time escaping can be done is when producing data, because if unescaped data is ever produced, the cat's out of the bag.

Edit: Actually think that producer/consumer is a wrong way to talk about this. Escaping only ever occurs at a boundary when transforming between formats (eg from "text string" to "html string") which is always both producer (of the new format) and consumer (of the old format). But it can always be thought of as a type cast, with possible type confusions when input and output formats share the same machine representation (eg string).

link

masklinn 1325 days ago

> That doesn't make sense to me and I agree with GP. If I consume HTML and I escape all HTML input I'm given, I'm utterly useless. [...] Now when I consume text and convert that text into HTML for further treatment, I'm producing HTML, and I must properly escape my input in that conversion.

Which is my point, it's the consumption side which defines what the escaping should be.

> Escaping only ever occurs at a boundary when transforming between formats (eg from "text string" to "html string") which is always both producer (of the new format) and consumer (of the old format).

A database interface is not a transformer / producer, needs escaping. Globbing is not a transformer either. Still needs escaping.

link

deredede 1325 days ago

I disagree, a database interface is a format boundary at which a transformation occurs (from text to SQL) and so is globbing (from text to pattern).

link

nl 1324 days ago

Sure, redefine the terms if you like.

The thing that accepts the input must make sure it is properly escaped. Think of SQL injection attacks - they are because the thing that accepts input hasn't properly escaped the input.

Cross site scripting attacks are exactly the same thing but occur when the input side doesn't properly escape HTML input.

link

rileymat2 1325 days ago

I suspect you are having a vigorous debate about the ill defined “producer consumer” terminology and probably agree.

link

neon_electro 1324 days ago

Is Rails' `html_safe` an example of what you're referring to? https://apidock.com/rails/String/html_safe

link

masklinn 1324 days ago

Might be, although it’s unclear how composition works.

Especially with Ruby having string interpolation.

link