Hacker News new | ask | show | jobs
by fluffything 2238 days ago
Today, if you write `cargo add unicode-segmentation` to a Rust 1.0 program, you can use the latest version of Unicode. You can also add an older version of the library and use an older version if you want.

Adding unicode segmentation to the standard library and making all Rust binaries, most of which don't do unicode segmentaiton, 20 Mb larger by unnecessarily bundling the unicode tables, makes no sense.

As you see in the other thread, the problem the parent poster has is that their organizaiton doesn't let them use crates from crates.io.

That's a stupid policy for a language like Rust, and the solution isn't "move all crates of crates.io into the standard library". The solution there is for them to write their own unicode-segmentation code (and async executor, and http stack, and linear algebra, and... the whole world), since that is what their organization wants them to do. That's a stupid policy, but it is their own stupid fault.

Most organizations either allow using crates from crates.io, or have a vetting policy, where you just submit a 1 liner email saying "I need to do unicode-segmentation and there is only one library for that that's used by Firefox here: ...". Somebody then checks licenses and stuff, and gives you approval in a couple of days. If their organization doesn't have such processes, then i'm sorry for them, but I don't see how this is in any way the standard library fault.

Whatever their reasons are "my org doesn't allow us to use cargo" isn't a good reason to move something into the standard library.

2 comments

> Adding unicode segmentation to the standard library and making all Rust binaries, most of which don't do unicode segmentaiton, 20 Mb larger by unnecessarily bundling the unicode tables, makes no sense.

Isn't that what dead code stripping is for?

If you are shipping a binary library, like the standard library, you don't know which parts of your library users are going to use when they link it, so you need to ship binaries with all of it to all Rust users.

The only one that can strip is the end user compiling a final binary, and the compiler often cannot do this for you because that requires whole program optimization and full LTO, which is super super slow.

Also, just because the final binary doesn't use a symbol, doesn't imply that the symbol isn't used. You can ship a library with a main function that can be run as an executable, or linked as a library. The linker doesn't know.

You would really need to go out of your way to strip your binary for your particular application. This is possible, and not that hard.

But the point remains: why should 99% of Rust users have to go through the trouble just so that those who need this don't have to write `cargo add unicode-segmentation` ?

Rust philosophy is "don't pay for what you don't use", so if your organization doesn't support third-party dependencies, they need a language with "batteries included", and not a language like Rust that comes without batteries by design.

Proposing to make Rust come with batteries included is proposing to change one of Rust's core values. Go ahead and write an RFC for that. I'll get my popcorn.

> The only one that can strip is the end user compiling a final binary, and the compiler often cannot do this for you because that requires whole program optimization and full LTO, which is super super slow.

Whole program dead code elimination doesn't have to be slow. Nim does that by default (it's not possible to turn it off since a couple releases actually) and it still compiles quite fast.

> Proposing to make Rust come with batteries included is proposing to change one of Rust's core values. Go ahead and write an RFC for that. I'll get my popcorn.

Again, my recommendation wasn't that, it was that the core team consider releasing "core" language features like Unicode support as first-party crates when they don't make sense as part of 'stdlib' not that I think they will. Feels weird to me that I have to rely on the goodwill of third parties to provide core language functionality like complete string handling.

(We already do do this in some cases; these crates are authored by "The Rust Project Developers". For example, the regex crate is one of these.)
Yeah, but that makes them third-party already for many orgs' policies, even if they come from the same set of developers.

As soon as you have to add a crate, you are in for extra review and pain.

I don't understand, wasn't

> it was that the core team consider releasing "core" language features like Unicode support as first-party crates

what you were asking for?

You can definitely ship binary libraries and use only whatever is needed, and there is no need for LTO/WPO to achieve that.

In fact, LTO/WPO have nothing to do with the ability to link whatever is needed.

(as a maybe unneeded clarification, my previous comment agrees with you)

I agree that something like unicode-segmentation should not be in the bundled standard library. specifically because the std should be forward compatible.

In my opinion it is why foundational libraries should strive to be on Rust 2015 edition rather than the latest.