Hacker News new | ask | show | jobs
by eridius 3336 days ago
Why do you say premature? Maps, for example, being such a crucial and widely-used data type, should be optimized as much as possible. There's nothing premature about using `unsafe` to build a highly-efficient Map.
1 comments

Let's see the benchmarks justifying the use of "unsafe" for maps. Maybe there's a better way to do it without much of a performance penalty. It's a good way to find out what optimizations the compiler is missing. It may even turn out that unsafe code written early no longer is a performance win, since the Rust compiler is getting better at optimizing out redundant subscript checks.

When you start looking through Rust libraries, "unsafe" turns up way too often.

(The hashmap that is used in the winning rust benchmarksgame entry is 100% safe, fwiw)

> When you start looking through Rust libraries, "unsafe" turns up way too often.

You keep making this claim without substantiation. Yes, there is some level of unnecessary unsafe, but certainly not "way too often". I recall going through all the crates in my .cargo and finding very little unnecessary unsafe, and showing you the audit: https://news.ycombinator.com/item?id=13280347

Please stop throwing around this claim without substantiation.

Do you think there is some room for education here? Would it be useful for Crates.io to show some "coverage" metric? eg. "unsafe lines = 5%" or "unsafe instructions = 0.01%"

I think that would be a pretty easy way to put this convo to bed yea?

Have there been talks about this before? Is it worth me opening an issue/RFC on Cargo?

I don't think it's worth making changes like that to satisfy random misconceptions unless they are widespread.

There are good reasons for having an unsafe-code flag on a crate. This is not one of those.

This has been discussed in the past and the main concern IIRC is that this may make people afraid to use crates that contain unsafe code even if it is well audited. There's a finer balance you want to strike here.

> winning rust benchmarksgame entry

C is winning that benchmark, around 2x faster than Rust.

http://benchmarksgame.alioth.debian.org/u64q/regexredux.html

Wrong benchmark. I'm talking about k-nucleotide, which tests hashmaps.

http://benchmarksgame.alioth.debian.org/u64q/knucleotide.htm...

(And I was only really talking of the comparison of rust v rust, not rust v c, i.e. that ordermap is faster than the built in hashmap despite not using unsafe)

Ah, sorry, you were writing about hashmaps and I was writing regexes in my IDE in background while reading this thread from time to time, probably that's why I've checked regex instead of hashmap benchmark. I even wrote this benchmark in language not listed on benchmarks game lately to compare the speed. Brain can work in interesting ways!
We've seen this unsubstantiated denial before on YC, at "Why I’m dropping Rust"[1] and "Rust sucks if I fail to write X".[2] I once started going through the Rust library packages and listed uses of "unsafe". Try doing that.

The previous discussions made a few things clear:

- The big design-level problems with data structure safety are 1) partially initialized arrays, and 2) backlinks. The first is needed for growing arrays in Vec, and the second is needed for doubly-linked lists and some kinds of trees. It's very hard to handle either of those in safe Rust. A very small number of packages need unsafe code for those functions, and those should be tightly controlled.

- Foreign code remains a problem, and is inherently unsafe when calling unsafe languages. The most downloaded Rust crate is "libc". What could possibly go wrong there? Was it really a good idea to import unsafe "strcpy" into Rust?

    pub fn strcpy(dst: *mut c_char, src: *const c_char) -> *mut c_char;
- There's a lot of use of "unsafe" code that's not strictly necessary. You find unsafe code like "from_utf8_unchecked", which then turns up in the JSON decoder at (https://github.com/rust-lang-deprecated/rustc-serialize/blob...). There are no comments on the safety of that. Is there some way to create bad JSON, get bad UTF-8 into a string, and cause trouble further upstream? I don't know, but somebody "optimized" there, and created a potential problem.

I could give many more examples.

Most of these problems are fixable. They're not inherent in Rust. Fixing them is important to Rust's credibility. If Rust is going to replace C++, which it should, the holes have to be plugged. It only takes one hole to create a security vulnerability.

Denial is not a river in Egypt.

[1] https://news.ycombinator.com/item?id=12474445 [2] https://news.ycombinator.com/item?id=13670366

> which then turns up in the JSON decoder at (https://github.com/rust-lang-deprecated/rustc-serialize/blob...). There are no comments on the safety of that.

Maybe there's a reason it's in a repository labeled "deprecated", namely that it's deprecated. The replacement serialization framework, serde, has exactly one appearance of "unsafe", in a function that is only compiled when you explicitly enable the "unstable" feature flag and that in fact appears to be a reasonably safe use of unsafe.

The thing is, you cannot decide which crate someone will depend on, regardless how it is labeled.
But you can read the dependency list or even the code and decide whether to use it or not?
> I once started going through the Rust library packages and listed uses of "unsafe". Try doing that.

That's ... exactly what I did? Talk about denial. I went through the libraries in my .cargo folder (which filters for libraries that actually get used, not just random libraries out there). I linked you to that audit in the comment.

Yes, libstd contains a lot more unsafe, but that's kind of the raison d'etre of libstd -- to contain OS-abstractions and very common internally-unsafe abstractions. It's better to have one unsafe implementation of Vec (in a library with a lot of eyeballs on it) than to have ten that the ecosystem relies on.

> which then turns up in the JSON decoder at

rustc-serialize is deprecated. It was deprecated before 1.0, and went into maintenance mode. It was still kinda-maintained because of the difficulty of using serde on stable, but now it's basically going to be full maintenance mode.

Also, that line is trivially safe to execute on untrusted input; `char` is utf8. Yes, a comment would be nice, but like I said, maintenance mode.

It is possible to do what that function does in safe Rust today, but it needs an API that probably didn't exist ~two years ago when that library was actually relevant. Fixing.

> The most downloaded Rust crate is "libc". What could possibly go wrong there? Was it really a good idea to import unsafe "strcpy" into Rust?

libc is just bindings (to everything in libc). You still need to use unsafe to call those functions (everything in `extern "C"` is unsafe to call even if not marked explicitly as such). This is not a valid example.

> A very small number of packages need unsafe code for those functions, and those should be tightly controlled.

which is pretty true already? The "partially initialized arrays" problem is generally handled by just using Vec. Yes, you need unsafe to write vec, but then you can just build things out of it.

Backlinks turn up rarely -- if perf isn't involved folks just use Weak safely, but otherwise people use petgraph or something.

> I could give many more examples.

Go ahead then. You literally never have, and none of these examples are valid "unnecessary unsafe" for reasons I gave above. I did substantiate with an audit. I'm genuinely interested in finding places where we have too much unsafe code, because I'd like to avoid depending on those crates and/or fix them.

> Most of these problems are fixable. They're not inherent in Rust. Fixing them is important to Rust's credibility.

Sure. And folks are always looking to improve this. But this doesn't mean that there's some widespread problem of unsafe being used too much in Rust. I'm not denying that this shouldn't be improved, I'm denying your allegations that `"unsafe" turns up way too often`.

rustc-serialize is deprecated

It's the fourth most downloaded crate.[1] Somebody may have "deprecated" it, but the users aren't paying attention. It's not listed as "deprecated" on its own Cargo page.[2] There's a weak note about deprecation on its Github page, but users don't look there.[3] It has 2,950,353 downloads, probably because other crates are pulling it in.

Will it be in the new set of "approved" packages? Or will all the crates that use it be fixed?

Again, denial. Not seeing many links from the "everything is OK, don't need to look at this" crowd.

[1] https://crates.io/ [2] https://crates.io/crates/rustc-serialize [3] https://github.com/rust-lang-deprecated/rustc-serialize

> It's the fourth most downloaded crate.[1] Somebody may have "deprecated" it, but the users aren't paying attention.

The most downloaded stat appears to be for the most downloaded of all time. Considering the difficulty of using serde on stable until recently, it's not surprising that it has been downloaded so much and would continue to be as projects move away from it.

> It's not listed as "deprecated" on its own Cargo page.

This is true, however as you said the GitHub page and the docs page both mention that. Unless one already knows how to use the crate, they are probably going to check the docs and see the deprecation. Although having a deprecation notice on crates.io does seem prudent.

> Again, denial. Not seeing many links from the "everything is OK, don't need to look at this" crowd.

You kind of skipped over the part where the parent poster explained why it wasn't even a big deal in the first place. It really feels like you're grasping at straws to make a point and I don't see why. No one seems to be denying that a lot of unsafe code is potentially bad, but there is really no solid proof that there is an unreasonable amount of unsafe usage. In fact, the parent has mentioned and linked to a list of such usages. If you're going to make such an extraordinary claim I think it behooves you to provide more than just a handful of examples.

Whether or not rustc-serialize is deprecated is kind of a red herring. It was already pointed out that the usage of there is trivially provably safe. The deprecation of rustc-serialize is really only relevant here because it explains why the crate is in maintenance mode and therefore why that particular unsafe block hasn't been replaced with a newer safe API.
Number of downloads is pretty terrible metric because nearly all downloads from a package repository are from automated/CI builds, not people. Depending on what's consuming a package, something which runs tests more frequently could easily inflate the number of downloads for any of it's dependencies.
> Foreign code remains a problem, and is inherently unsafe when calling unsafe languages. The most downloaded Rust crate is "libc". What could possibly go wrong there? Was it really a good idea to import unsafe "strcpy" into Rust?

And what would you prefer a tool like Corrode use? Or programmers manually doing an initial "no refactoring" conversion pass from C or C++ to Rust? Should they all use their own ad-hoc libc bindings when converting to Rust? That sounds harder to audit busywork with no upside. Or perhaps it should be impossible to incrementally convert a C codebase, instead being forced to do an up front monolithic rewrite? Although my experience has been that leads to more debugging, bugs, pain, and abandoned rewrites.

This isn't to say that improvements cannot or should not be made, but a reminder that perfect is the enemy of the good. Yes, you should really really really not use strcpy in new code if you can possibly avoid it - and you can. And you should refactor it away in existing code ASAP. But that argument probably extends to libc in Rust code in general. Being able to test-compile if your codebase builds yet when you remove libc sounds way better than trying to hunt down and remove all your ad-hoc defines.

And as a stepping stone, I'd suspect the good outweighs the bad here. Not because there will never be a security vulnerability in Rust code because of an unrefactored strcpy in Rust (there very well may be one if there hasn't already!) - but because it will encourage conversion, and thus subsequent cleanup and refactoring, that might not have otherwise happened.

> There are no comments on the safety of that. Is there some way to create bad JSON, get bad UTF-8 into a string, and cause trouble further upstream? I don't know, but somebody "optimized" there, and created a potential problem.

A quick look at blame shows this originally was applied to a buffer that was encode_utf8(...)ed. The refactoring (to remove a libunicode dependency) has made the safety less clear, yes. I could see the use for a method:

  let buf = v.to_str_with(&mut buf);
Ultimately, this is just shifting the "problem" from json.rs to str/mod.rs, but the latter was going to have to deal with unsafe code anyways, as we're getting into the details of "just how is str implemented anyways": https://doc.rust-lang.org/1.4.0/src/core/str/mod.rs.html#125 . Although perhaps there's something to be said for keeping that to the bare essentials as well...

> Most of these problems are fixable. They're not inherent in Rust. Fixing them is important to Rust's credibility. If Rust is going to replace C++, which it should, the holes have to be plugged. It only takes one hole to create a security vulnerability.

I encourage you to submit pull requests (at least for non-deprecated projects) if you haven't already.

I feel the need to restate the obvious, although I'm sure you're aware already: There will be one hole. In fact, there will be two holes. I dare say there will be no less than three security vulnerabilities, even - for mistakes are inherent to programming. Perfect safety is not an achievable goal - only better safety. Worse - some of the people using Rust are going to have a higher tolerance for unsafe than you. But at least they're no longer using C though, right?

(I'm afraid I'm still using C and C++.)

(I will mention that shifting unsafe to libstd is a good thing, this way you have more eyeballs on it and it's reusable, with less chance of people messing it up elsewhere. This is why a large fraction of what's in libstd is unsafe stuff so that people don't end up implementing their own.)
Unsafe rarely turns up in any of the libraries I've used. Where are you getting this from?