| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by armchairhacker 763 days ago

I don’t know. R promises are extremely powerful. Not only can they run arbitrary code (e.g. shell commands), but they have arbitrary access over the caller environment (e.g. you can pass a lazy argument to a function that can list all variable names/values of variables in the function’s body and mutate some of them).

I also don’t know if deserializing is 100% secure even now, because it only detects whether the root value is lazy, and I’m not sure if certain value’s children can be lazy as well.

I think the larger issue is that most languages are insecure unless you go out of your way to be careful. Many package managers (including cargo) let dependencies run arbitrary build scripts. AFAIK reading a Python picklefile can invoke arbitrary code, which is arguably worse than deserializing an RDS file because in R you at least have to read the malicious deserialized value. The problem of reading untrusted data isn’t new, see log4j and SQL injections.

All input should be either a) trusted or b) handled carefully. Then it doesn’t matter the language. The problem is that’s not easy. Like in R, if `readRDS` really can still return promises, then “handling it carefully” means inspecting every nested value without reading it (this is possible in R with reflection); or more likely (as with Python’s pickling), read the data in a more constrained format.

1 comments

ethbr1 763 days ago

People whose day job is security probably have terms for this, but it seems important to distinguish theoretically-vulnerable and practically-vulnerable.

In the sense that for sufficiently complex ecosystems (read: all widely used programming ecosystems) each component may itself be theoretically secure... and yet the ways they are commonly used in practice are insecure.

>> Users should ensure that they only use R code and data from trusted sources and that the privileges of the account running R are appropriately limited.

IMHO, this is a cop-out. Abrogating responsibility for common use patterns in your ecosystems isn't how you make everyone more secure.

Better: 'What are our users actually doing?' -> 'Why are they doing that?' (usually: inconvenient UX around secure alternatives) -> 'How can we make it easier to use secure alternatives?'