| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rfoo 769 days ago
	tl;dr R has its own pickle.load and someone decided to milk a CVE [1] out of this fact. [1] and a blog post for bragging, thankfully they didn't do a name and a logo.

4 comments

cmeacham98 769 days ago

This is uncharitable.

From what I can tell, these RDS files are a common way of sharing data among R users. I would be relatively surprised if reading someone else's dataset was able to execute arbitrary code.

I think this is more like if reading a CSV via numpy could execute code.

link

steve_s 759 days ago

RDS files are a common way of sharing serialized R objects. Promises are valid R objects and supported by this serialization format. They always have been and I believe it is an intentional feature. The problem is that some people may think of RDS files as more convenient CSV files, but they are not.

link

greentxt 769 days ago

CSV is CSV. A serialized object is a serialized object. The main concern they cite, are supply chain attacks. So it’s like saying loading a package can… load a package. Supply chain attacks will always be a thing. I’m grateful for the work of the researchers in question but don’t feel this is much of a blemish when it comes to R itself being insecure.

link

fanf2 769 days ago

I think the researchers didn’t identify the main vulnerability. They should have talked about the risk of remote code execution from reading serialized objects from untrusted sources, when the R programmer thinks they are reading data but they are actually running code. This mistake has led to huge numbers of remote code execution vulnerabilities in all sorts of object deserialization libraries; it’s a much more common threat than supply chain attacks.

link

skybrian 769 days ago

It’s true that it’s always been that way, but there are other common but unsafe ways of doing things that people eventually stopped using. Some pressure to deprecate and migrate away from unsafe API’s seems good.

link

jojobas 769 days ago

Is there another way to load a saved dataset in R though, so that it can't execute anything?

link

vharuck 769 days ago

Save it in the usual text-based formats, like a CSV or JSON. Outside of packages, which use serialized data by default for good reasons, I haven't seen many people loading strangers' RDS or RData files.

If an attacker can control a package's rdb and rdx files, it's game over. They could just stick an `.onAttach` function in that does whatever they want when the package is loaded directly or imported by another package.

link

jojobas 769 days ago

The fact that they had to mess with unbounded promises, and that the bug got fixed suggests you normally can't run any code from load().

link

rfoo 769 days ago

.pkl files were, are, and will still be a a common way of sharing data among Python users. Despite it is known to be unsafe since forever and nobody claimed a CVE for this fact.

A few years back I have heard from a lot of people working in ML communities that they are surprised that `numpy.load` is able to execute arbitrary code.

link

fanf2 769 days ago

There are lots of Python pickle remote code execution CVEs https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=pickle

link

imurray 769 days ago

> A few years back I have heard from a lot of people working in ML communities that they are surprised that `numpy.load` is able to execute arbitrary code.

This is correct, before version 1.16.3 (April 2019) `numpy.load` was unsafe by default, unless explicitly specifying `allow_pickle=False`. However, to be clear, that unsafe default was then fortunately changed. Loading numpy arrays with `numpy.load` should now be safe (unless there are yet-to-be-found bugs in that code).

link

VWWHFSfQ 769 days ago

> Despite it is known to be unsafe since forever and nobody claimed a CVE for this fact.

There have been dozens, if not _hundreds_, of CVEs filed on issues related to pickle and RCE.

Here is a small sample:

https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=pickle

link

notachatbot1234 769 days ago

Those are CVEs on other software that use pickle in insecure ways. Not on pickle itself.

link

lyu07282 769 days ago

In applications using pickle on untrusted data, that's a big distinction. There are a huge number of similar java and c# object serializationg bugs as well.

link

neonsunset 769 days ago

There aren't in C#. Neither Newtonsoft.JSON (by default) nor System.Text.Json (at all) allow uncontrolled deserialization. Pretty much no code ever defaulted to Newtonsoft's TypeNameHandling.Auto and community has always been aware of its dangers, espcially in light of the incidents like Log4J.

And BinaryFormatter has been long ago deprecated (and now it got completely removed, in the form of a breaking change, something that pretty much never happens otherwise), and even when it was in use (more than a decade ago, popularity-wise), the use of type binding was heavily encouraged.

link

ethbr1 769 days ago

C# is pretty hard-nosed about serialization.

E.g. My discovery the other day that out of the box C# System.Text.Json can't serialize System.Exception without writing a custom serializer [0] (since 2020, because .NET fix speed...). NewtonSoft handles it fine. (Had wanted a quick-and-dirty debugging dump of properties)

[0] https://github.com/dotnet/runtime/issues/43026

link

lyu07282 769 days ago

I was thinking of BinaryFormatter and NetDataContractSerializer, etc. unsafe .NET object deserialization. I'm sure the default JSON serializer in C# is safe (lmao language fanboys)

https://github.com/pwntester/ysoserial.net

link

lyu07282 769 days ago

Yes but the fact that R was apparently able to fix the issue at all is a bit strange then. You can't "fix" pickle code execution.

link

Twirrim 769 days ago

Weird. I don't think I've ever relied on pickle for sharing data. It's too version specific. I always dump to json, or similar.

link

greentxt 769 days ago

See it all the time.

link

Pinus 769 days ago

CVE-2019-6446 seems to be in the right ballpark.

link

fanf2 769 days ago

I think a good response from the R authors should:

• Make clear the bug is due to unsafe deserialization (not serialization as their statement says). This is important because unsafe deserialization is a major source of remote code execution vulnerabilities.

• Update the documentation to make it clear that R’s serialization and deserialization functions are not safe to use for sharing data across the network. Serialized objects should be treated as code, not data.

link

phoe-krk 769 days ago

Blog post in question: https://hiddenlayer.com/research/r-bitrary-code-execution/

link

ziddoap 769 days ago

>and a blog post for bragging, thankfully they didn't do a name and a logo.

I am still amazed on how many people on HN seem to get worked up over vulnerability names. God forbid someone also slaps a piece of clip art or whatever on the blog post. Worse yet, if they buy a $5 domain... the horror!

Maybe it's just me, but I'd much rather remember "Heartbleed" over "CVE-2014-0160".

link

rfoo 769 days ago

It's fine when your bugs are (unanimously) cool, be it Heartbleed, Meltdown, Spectre or Load Value Injection (this one gets a hilarious video even).

For less cool bugs a logo and a name seems rather... strange, because it happens all the time and it's not clear why it's special. Imagine a coworker fixed a random JIRA ticket which may be "switching to night mode does not work on a certain page" and then named it "Nightfall" and a logo and a landing page and a lot of bragging in the next periodic meeting.

link

DEADMINCE 768 days ago

> Imagine a coworker fixed a random JIRA ticket which may be "switching to night mode does not work on a certain page" and then named it "Nightfall" and a logo and a landing page and a lot of bragging in the next periodic meeting.

Well, that would be hilarious.

link