Hacker News new | ask | show | jobs
by arcticbull 2244 days ago
IMO it's not as clear-cut as you make it out to be. It's a pretty arbitrary line to exclude full Unicode support from the standard library. There's a ton of stuff in libstd that could be supported as third-party crates. I don't disagree with what the Rust team has done, and I think there could be a world in which the compiler team also releases first-party crates with "enhanced" functionality beyond just libstd. I consider proper Unicode support to be a "first party" thing, but I also don't think it has to be in libstd per se, necessarily.

For the record, I also disagree with your assertion that "easily done in rust" should be extended to include "...by importing a third-party framework." In that sense anything is easy to do in any language where a third-party framework exists. I'm confident it's just as easy in go.

2 comments

> For the record, I also disagree with your assertion that "easily done in rust" should be extended to include "...by importing a third-party framework." In that sense anything is easy to do in any language where a third-party framework exists. I'm confident it's just as easy in go.

Have you tried doing that in C++? Doing that in a cross-platform way (or even in a single platform) is anything but easy, because you don't have a tool like cargo, you have to change your build system, do the dependency resolution manually, etc.

So no, such a library existing does not imply that using that is easy.

In Rust, you just need to write `cargo add unicode-segmentation` once in a project, and then you can directly use the library API. There is literally nothing else for you to do.

That's a pretty low barrier of entry, and something you will need to do 100s of times per project anyway, because the standard library is minimal by design.

If you prefer languages without a minimal standard library, then Rust isn't for you. Go try Python, where half of the standard library has a warning saying "deprecated: use this other better external dependency instead; adding this to the standard library for convenience was the worst idea ever and now we need to maintain all this code forever".

In C++ you do have cross-platform tools like cargo. For instance, vcpkg. > In Rust, you just need to write `cargo add unicode-segmentation` once in a project,

In C++ I can write `vcpkg install whatever`. Yet that does not mean my organization will allow the library.

So no, adding libraries is quite harder than installing them unless you are working in your own projects alone. And even then adding them is never a one line effort.

> If you prefer languages without a minimal standard library, then Rust isn't for you.

There is nothing minimal about Rust's std.

the point would be that the standard library should be forward compatible while crates should be backward compatible.

so that current crates work with old version of compilers/toolchains.

this applies here as each new Unicode standard requires an update of the Unicode crate. ideally the best case would be to make it so that in 20 years Rust 1.0 can still use the most updated version of Unicode fragmentation. similarly to how some C libraries insist on C89 compatibility to still work on older systems.

I guess Rust would like it if this never became indispensable but also should be possible

Today, if you write `cargo add unicode-segmentation` to a Rust 1.0 program, you can use the latest version of Unicode. You can also add an older version of the library and use an older version if you want.

Adding unicode segmentation to the standard library and making all Rust binaries, most of which don't do unicode segmentaiton, 20 Mb larger by unnecessarily bundling the unicode tables, makes no sense.

As you see in the other thread, the problem the parent poster has is that their organizaiton doesn't let them use crates from crates.io.

That's a stupid policy for a language like Rust, and the solution isn't "move all crates of crates.io into the standard library". The solution there is for them to write their own unicode-segmentation code (and async executor, and http stack, and linear algebra, and... the whole world), since that is what their organization wants them to do. That's a stupid policy, but it is their own stupid fault.

Most organizations either allow using crates from crates.io, or have a vetting policy, where you just submit a 1 liner email saying "I need to do unicode-segmentation and there is only one library for that that's used by Firefox here: ...". Somebody then checks licenses and stuff, and gives you approval in a couple of days. If their organization doesn't have such processes, then i'm sorry for them, but I don't see how this is in any way the standard library fault.

Whatever their reasons are "my org doesn't allow us to use cargo" isn't a good reason to move something into the standard library.

> Adding unicode segmentation to the standard library and making all Rust binaries, most of which don't do unicode segmentaiton, 20 Mb larger by unnecessarily bundling the unicode tables, makes no sense.

Isn't that what dead code stripping is for?

If you are shipping a binary library, like the standard library, you don't know which parts of your library users are going to use when they link it, so you need to ship binaries with all of it to all Rust users.

The only one that can strip is the end user compiling a final binary, and the compiler often cannot do this for you because that requires whole program optimization and full LTO, which is super super slow.

Also, just because the final binary doesn't use a symbol, doesn't imply that the symbol isn't used. You can ship a library with a main function that can be run as an executable, or linked as a library. The linker doesn't know.

You would really need to go out of your way to strip your binary for your particular application. This is possible, and not that hard.

But the point remains: why should 99% of Rust users have to go through the trouble just so that those who need this don't have to write `cargo add unicode-segmentation` ?

Rust philosophy is "don't pay for what you don't use", so if your organization doesn't support third-party dependencies, they need a language with "batteries included", and not a language like Rust that comes without batteries by design.

Proposing to make Rust come with batteries included is proposing to change one of Rust's core values. Go ahead and write an RFC for that. I'll get my popcorn.

> The only one that can strip is the end user compiling a final binary, and the compiler often cannot do this for you because that requires whole program optimization and full LTO, which is super super slow.

Whole program dead code elimination doesn't have to be slow. Nim does that by default (it's not possible to turn it off since a couple releases actually) and it still compiles quite fast.

> Proposing to make Rust come with batteries included is proposing to change one of Rust's core values. Go ahead and write an RFC for that. I'll get my popcorn.

Again, my recommendation wasn't that, it was that the core team consider releasing "core" language features like Unicode support as first-party crates when they don't make sense as part of 'stdlib' not that I think they will. Feels weird to me that I have to rely on the goodwill of third parties to provide core language functionality like complete string handling.

(We already do do this in some cases; these crates are authored by "The Rust Project Developers". For example, the regex crate is one of these.)
You can definitely ship binary libraries and use only whatever is needed, and there is no need for LTO/WPO to achieve that.

In fact, LTO/WPO have nothing to do with the ability to link whatever is needed.

(as a maybe unneeded clarification, my previous comment agrees with you)

I agree that something like unicode-segmentation should not be in the bundled standard library. specifically because the std should be forward compatible.

In my opinion it is why foundational libraries should strive to be on Rust 2015 edition rather than the latest.