Hacker News new | ask | show | jobs
by bcrosby95 1329 days ago
Escape strings at the last possible moment, and ideally it's done by whatever library you're using so you never have to worry about it. It's never not been clear to me in our codebases if I'm dealing with a raw string or a safe one. They're all unsafe, because you have no clue what context they're going to be used in.

If you're writing a web framework or a DB library things might be different though - in that case a different class probably makes sense. If you have a module for a certain communication medium, then yeah you might use it in that module. But if you're writing a webapp, passing around escaped strings is a bad idea 99% of the time. It creates code highly coupled to one aspect of your system.

Just imagine if you did this with networking. I'm glad we're not in a world where we're passing around TCPString or UDPString or IPString or EthernetString or TokenRingString or CarrierPigeonString because that happens to be a networking stack the app uses sometimes. It sounds like hell.

1 comments

> They're all unsafe, because you have no clue what context they're going to be used in.

That's correct, but it's the reverse thinking from the escaping one.

Because in the escaping one, when you need not to escape you will also not-escape at the last possible moment, and that's a sure-fire way to launder attacker-controlled data.

Instead you should escape everything, and opt-out as early as possible.

> But if you're writing a webapp, passing around escaped strings is a bad idea 99% of the time. It creates code highly coupled to one aspect of your system.

That's why you do the reverse: most strings are unsafe to everything, but the strings which are safe are generally safe to one specific subsystem. So you say that.

> Just imagine if you did this with networking. I'm glad we're not in a world where we're passing around TCPString or UDPString or IPString or EthernetString or TokenRingString or CarrierPigeonString because that happens to be a networking stack the app uses sometimes. It sounds like hell.

It sounds like hell because it makes no sense, there's no such thing as a TCPString because TCP is not string-based and TCP messages are not composed that way.

> Instead you should escape everything, and opt-out as early as possible.

That’s not even remotely workable for any system with more than one kind of “escaping”. What if I want to use a string as:

1. An IDNA-encoded domain name

2. An HTML text snippet

3. A shell command string argument

4. A string literal part of a regular expression

5. A part to be used in an XML CDATA section

6. A JSON string

I can’t escape the string beforehand, since the escaping rules are all different. No, the only sensible alternative is to use the same rule which we all use for character encoding: Encode and decode (and escape) at the edges.

> I can’t escape the string beforehand, since the escaping rules are all different.

You’re still misunderstanding. You shouldn’t escape at any point, instead you should mark things as safe as early as possible.

“Safe” almost always has a single context, you don’t care if it’s going to go somewhere else because it’s not safe for there.

Anything that’s not marked as safe is then automatically considered unsafe and processed as such by the sink.

> What if I want to use a string as:

It’s not an issue, because by default nothing is safe anywhere, so all those APIs should treat the injected data thus.

There is no escaping, because everything is automatically internally escaped by default.

> It’s not an issue, because by default nothing is safe anywhere, so all those APIs should treat the injected data thus.

No library does this, since it does not know what strings I send it with their literal meaning intended, and which strings I send it with their escape characters intended to be interpreted. The escape characters are part of the API of that library. The library does not accept “strings” as such, it accepts “escaped” strings. And since my program deals with normal unescaped strings, I have to escape the strings before I send them to the API.

> There is no escaping, because everything is automatically internally escaped by default.

I have a feeling that you have a different meaning of the word “escaped” than me.

> No library does this

Most modern templates do exactly that. Jinja certainly does.

> The library does not accept “strings” as such, it accepts “escaped” strings. And since my program deals with normal unescaped strings, I have to escape the strings before I send them to the API.

That’s the problem with the library. That is what needs to be fixed.

> I have a feeling that you have a different meaning of the word “escaped” than me.

Add “explicit” to the first occurrence if you don’t understand without it.