Hacker News new | ask | show | jobs
by njohnson41 3755 days ago
Sounds a lot like python's "pickle" module (which is super-useful for prototyping), but with the same achilles' heel: all of your serialized objects can now run arbitrary code when you deserialize them!
2 comments

I think the main thing Jed was going for, which is different from Pickle and most people aren't picking up on, is that the output is simply JavaScript. It doesn't output a storage format that must be turned back into live objects by some form of parser, and so it doesn't require any additional scripts to be used.

(edited to add that, of course, pickle is part of the python standard library, which makes this specific feature less interesting)

Not _quite_ arbitrary; the only code run is that generated by lave. Arbitrary code present in functions is parsed, but not run.
But if there is a persistence or network layer involved, when compromised, it could function as an injection vector into the application, right?
Sure, as it could with any part of your app, including wherever your JSON.parse code lives.
But could JSON.parse() fed with malicious data fire off an XMLHttpRequest or delete all of your data?
No. That's the whole point of using JSON.parse() instead of eval(). JSON is defined as a non-executing subset of JavaScipt syntax, one that contains only literal expressions. JSON.parse() will only parse valid JSON.

This is why, for instance, there's no native Date format in JSON. Dates in JavaScript require running a constructor -- new Date() -- so they aren't in JSON.

That would concern me too potentially, though if there isn't one already it should be easy to implement a switch that would turn this part of the behaviour off.

It looks very interesting to me from the point of view of dealing with certain data types better (at all, in fact), and handling circular references.

> it should be easy to implement a switch that would turn this part of the behaviour off

I think it's one of those "easy on the surface, but surprisingly complex" problems.

When you start allowing arbitrary code execution, it's a lot of work to prevent certain "functions" (i mean that in the non-programming way).

It seems like it theoretically shouldn't be too hard to add the ability to validate that the data is a valid lave output if you're concerned about that. That more or less leaves only the issue of anonymous functions in the data being replaced with malicious functions, but frankly the only reason to be serializing functions is if they're user input, otherwise you should instead be serializing function name/key strings or some other well-defined form of function references and/or arguments.
Any type of code-serialization tool will be vulnerable to injection. This is why use of pickle is often discouraged in Python, in favor of serialization formats which don't deserialize to code. Anything that marks "valid output of the tool" could just as easily be produced by an attacker who uses the tool to serialize their malicious code, and even signing/secret-token systems aren't a guarantee since it's so incredibly easy to build or use them the wrong way.
OK, so put a ring on it.

And by ring, I of course mean HMAC.