Hacker News new | ask | show | jobs
by spankalee 764 days ago
> [1] - Templating systems themselves are thus a mistake belonging to this class

This is not universally true.

JavaScript has an amazing feature called tagged template literals which let you tag a string with interpolations with a function that handles the literal and interpolation parts separately. This lets the tag function handle the literals as trusted developer written HTML or SQL, and the interpolations as untrusted user-provided values.

Lit's HTML template system[1] uses this to basically eliminate XSS (there are some HTML features like "javascript: " attributes that require special handling).

ex:

    html`<h1>Hello, ${name}</h1>`
If `name` is a user-provided string, it can never insert a <script> or <img> tag, etc., because it's escaped.

There are similar tags for SQL, GraphQL, etc. Java added a similar String Templates feature in 21.

[1]: https://lit.dev/docs/templates/overview/

2 comments

> If `name` is a user-provided string, it can never insert a <script> or <img> tag, etc., because it's escaped.

Be careful with that "never". A curious and persistent person might discover a bug in the implementation, leading to something like the Log4Shell issue.

Not sure why you are being downvoted here. It's a fair point and properly escaping your data is only one part of the overall security picture but you should also be strictly validating data at the inputs to your system too.
Luckily, for Lit specifically, the "escaping" is done by the browser by setting textContent, so the string literally never passes through the HTML parser. Any string is valid text content, and if you found a bug that permitted unsafe text to be parsed as HTML somehow, it would be a browser bug and a very, very serious one.

But it'd be similar with with other template systems. If the interpolation should allow any string, there's really no validation to be done.

That's exactly the kind of hack that worries me. Your example is still (seemingly[0]) gluing text at serialized level, ignoring the actual structure of HTML language. ${name} should never be able to insert any text that would end up being interpreted as markup. Not only when some code decides it's not user-provided; it's not even possible to make that test be 100% accurate, and it doesn't protect you from mistakes in "trusted" strings (like totally trusted `name` having a stray '>' in it).

The bulletproof way of doing this is working at the level of abstraction of your target language. With HTML, that would be a tree structure. For example, if your HTML generation looks more like:

  ["H1", "Hello, " + name]
and that is passed to code that actually builds up the tree and then serializes it down to HTML, then there is no way `name` could ever break the structure or inject anything.

--

[0] - I skimmed the docs of Lit, it seems there are restrictions on where interpolation can be placed, but I don't think they're actually building up the tree expressed by the static parts.

Your assumptions here are very, very wrong. Calling it a hack is only telling on yourself, honestly.

Lit is not working at the serialized level, at all. It parses the templates independently of any values, and the values are inserted into the already parsed tree structure. There's is literally no way for values to be parsed as HTML.