| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by cmeacham98 810 days ago

This is uncharitable.

From what I can tell, these RDS files are a common way of sharing data among R users. I would be relatively surprised if reading someone else's dataset was able to execute arbitrary code.

I think this is more like if reading a CSV via numpy could execute code.

3 comments

steve_s 800 days ago

RDS files are a common way of sharing serialized R objects. Promises are valid R objects and supported by this serialization format. They always have been and I believe it is an intentional feature. The problem is that some people may think of RDS files as more convenient CSV files, but they are not.

greentxt 810 days ago

CSV is CSV. A serialized object is a serialized object. The main concern they cite, are supply chain attacks. So it’s like saying loading a package can… load a package. Supply chain attacks will always be a thing. I’m grateful for the work of the researchers in question but don’t feel this is much of a blemish when it comes to R itself being insecure.

fanf2 810 days ago

I think the researchers didn’t identify the main vulnerability. They should have talked about the risk of remote code execution from reading serialized objects from untrusted sources, when the R programmer thinks they are reading data but they are actually running code. This mistake has led to huge numbers of remote code execution vulnerabilities in all sorts of object deserialization libraries; it’s a much more common threat than supply chain attacks.

skybrian 810 days ago

It’s true that it’s always been that way, but there are other common but unsafe ways of doing things that people eventually stopped using. Some pressure to deprecate and migrate away from unsafe API’s seems good.

jojobas 810 days ago

Is there another way to load a saved dataset in R though, so that it can't execute anything?

vharuck 810 days ago

Save it in the usual text-based formats, like a CSV or JSON. Outside of packages, which use serialized data by default for good reasons, I haven't seen many people loading strangers' RDS or RData files.

If an attacker can control a package's rdb and rdx files, it's game over. They could just stick an `.onAttach` function in that does whatever they want when the package is loaded directly or imported by another package.

jojobas 809 days ago

The fact that they had to mess with unbounded promises, and that the bug got fixed suggests you normally can't run any code from load().

rfoo 810 days ago

.pkl files were, are, and will still be a a common way of sharing data among Python users. Despite it is known to be unsafe since forever and nobody claimed a CVE for this fact.

A few years back I have heard from a lot of people working in ML communities that they are surprised that `numpy.load` is able to execute arbitrary code.

fanf2 810 days ago

There are lots of Python pickle remote code execution CVEs https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=pickle

imurray 810 days ago

> A few years back I have heard from a lot of people working in ML communities that they are surprised that `numpy.load` is able to execute arbitrary code.

This is correct, before version 1.16.3 (April 2019) `numpy.load` was unsafe by default, unless explicitly specifying `allow_pickle=False`. However, to be clear, that unsafe default was then fortunately changed. Loading numpy arrays with `numpy.load` should now be safe (unless there are yet-to-be-found bugs in that code).

VWWHFSfQ 810 days ago

> Despite it is known to be unsafe since forever and nobody claimed a CVE for this fact.

There have been dozens, if not _hundreds_, of CVEs filed on issues related to pickle and RCE.

Here is a small sample:

https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=pickle

notachatbot1234 810 days ago

Those are CVEs on other software that use pickle in insecure ways. Not on pickle itself.

lyu07282 810 days ago

In applications using pickle on untrusted data, that's a big distinction. There are a huge number of similar java and c# object serializationg bugs as well.

neonsunset 810 days ago

There aren't in C#. Neither Newtonsoft.JSON (by default) nor System.Text.Json (at all) allow uncontrolled deserialization. Pretty much no code ever defaulted to Newtonsoft's TypeNameHandling.Auto and community has always been aware of its dangers, espcially in light of the incidents like Log4J.

And BinaryFormatter has been long ago deprecated (and now it got completely removed, in the form of a breaking change, something that pretty much never happens otherwise), and even when it was in use (more than a decade ago, popularity-wise), the use of type binding was heavily encouraged.

ethbr1 810 days ago

C# is pretty hard-nosed about serialization.

E.g. My discovery the other day that out of the box C# System.Text.Json can't serialize System.Exception without writing a custom serializer [0] (since 2020, because .NET fix speed...). NewtonSoft handles it fine. (Had wanted a quick-and-dirty debugging dump of properties)

[0] https://github.com/dotnet/runtime/issues/43026

neonsunset 809 days ago

A serializer cannot make a reasonable assumption about how an exception should be serialized on the user's behalf, let alone deserialized. Newtonsoft had and still has quite a few problematic defaults where people can inadvertently weaken privacy and security of the implementation, which System.Text.Json is opinionated in solving.

If you are okay with risks that come from including exception's message in data sent over the network (e.g. not publicly exposed), then defining a custom converter is trivial (it's like 10-15 lines and adding it to serializer options), or you could simply .ToString/.Message it and include that in the payload instead. It's a minute thing.

As for exception deserialization, that's a gross feature misuse and not something that should be done.

lyu07282 809 days ago

I was thinking of BinaryFormatter and NetDataContractSerializer, etc. unsafe .NET object deserialization. I'm sure the default JSON serializer in C# is safe (lmao language fanboys)

https://github.com/pwntester/ysoserial.net

lyu07282 810 days ago

Yes but the fact that R was apparently able to fix the issue at all is a bit strange then. You can't "fix" pickle code execution.

Twirrim 810 days ago

Weird. I don't think I've ever relied on pickle for sharing data. It's too version specific. I always dump to json, or similar.

greentxt 810 days ago

See it all the time.

Pinus 810 days ago

CVE-2019-6446 seems to be in the right ballpark.