Hacker News new | ask | show | jobs
by fluffything 2233 days ago
> It does not make sense either to expect someone to use bleeding edge libraries from cargo yet use an old rustc compiler.

Of course it does. Many software users are stuck on multiple-year-old toolchains for various reasons, yet these systems still need to be able to handle unicode properly.

> They can easily update it if needed.

No, they cannot. Many users are stuck in older windows versions, linux versions, LTS linux versions, etc. because of their organization or their clients requirements.

Telling a client that you can't develop an app for them because their 2 year old Ubuntu is too old is often not a very successful business model.

> and keep updating the stdlib one.

These updates would only apply to newer Rust toolchains, that many users cannot use. Unless you are suggesting the release of patch versions for the soon to be 100 old Rust toolchains in existence every time the unicode standard is updated.

This is too much trouble and work for little gain, given that one can still use a Rust 1.0 compiler to compile the latest version of the unicode-segmentation crate without problems.

2 comments

IMO it's not as clear-cut as you make it out to be. It's a pretty arbitrary line to exclude full Unicode support from the standard library. There's a ton of stuff in libstd that could be supported as third-party crates. I don't disagree with what the Rust team has done, and I think there could be a world in which the compiler team also releases first-party crates with "enhanced" functionality beyond just libstd. I consider proper Unicode support to be a "first party" thing, but I also don't think it has to be in libstd per se, necessarily.

For the record, I also disagree with your assertion that "easily done in rust" should be extended to include "...by importing a third-party framework." In that sense anything is easy to do in any language where a third-party framework exists. I'm confident it's just as easy in go.

> For the record, I also disagree with your assertion that "easily done in rust" should be extended to include "...by importing a third-party framework." In that sense anything is easy to do in any language where a third-party framework exists. I'm confident it's just as easy in go.

Have you tried doing that in C++? Doing that in a cross-platform way (or even in a single platform) is anything but easy, because you don't have a tool like cargo, you have to change your build system, do the dependency resolution manually, etc.

So no, such a library existing does not imply that using that is easy.

In Rust, you just need to write `cargo add unicode-segmentation` once in a project, and then you can directly use the library API. There is literally nothing else for you to do.

That's a pretty low barrier of entry, and something you will need to do 100s of times per project anyway, because the standard library is minimal by design.

If you prefer languages without a minimal standard library, then Rust isn't for you. Go try Python, where half of the standard library has a warning saying "deprecated: use this other better external dependency instead; adding this to the standard library for convenience was the worst idea ever and now we need to maintain all this code forever".

In C++ you do have cross-platform tools like cargo. For instance, vcpkg. > In Rust, you just need to write `cargo add unicode-segmentation` once in a project,

In C++ I can write `vcpkg install whatever`. Yet that does not mean my organization will allow the library.

So no, adding libraries is quite harder than installing them unless you are working in your own projects alone. And even then adding them is never a one line effort.

> If you prefer languages without a minimal standard library, then Rust isn't for you.

There is nothing minimal about Rust's std.

the point would be that the standard library should be forward compatible while crates should be backward compatible.

so that current crates work with old version of compilers/toolchains.

this applies here as each new Unicode standard requires an update of the Unicode crate. ideally the best case would be to make it so that in 20 years Rust 1.0 can still use the most updated version of Unicode fragmentation. similarly to how some C libraries insist on C89 compatibility to still work on older systems.

I guess Rust would like it if this never became indispensable but also should be possible

Today, if you write `cargo add unicode-segmentation` to a Rust 1.0 program, you can use the latest version of Unicode. You can also add an older version of the library and use an older version if you want.

Adding unicode segmentation to the standard library and making all Rust binaries, most of which don't do unicode segmentaiton, 20 Mb larger by unnecessarily bundling the unicode tables, makes no sense.

As you see in the other thread, the problem the parent poster has is that their organizaiton doesn't let them use crates from crates.io.

That's a stupid policy for a language like Rust, and the solution isn't "move all crates of crates.io into the standard library". The solution there is for them to write their own unicode-segmentation code (and async executor, and http stack, and linear algebra, and... the whole world), since that is what their organization wants them to do. That's a stupid policy, but it is their own stupid fault.

Most organizations either allow using crates from crates.io, or have a vetting policy, where you just submit a 1 liner email saying "I need to do unicode-segmentation and there is only one library for that that's used by Firefox here: ...". Somebody then checks licenses and stuff, and gives you approval in a couple of days. If their organization doesn't have such processes, then i'm sorry for them, but I don't see how this is in any way the standard library fault.

Whatever their reasons are "my org doesn't allow us to use cargo" isn't a good reason to move something into the standard library.

> Adding unicode segmentation to the standard library and making all Rust binaries, most of which don't do unicode segmentaiton, 20 Mb larger by unnecessarily bundling the unicode tables, makes no sense.

Isn't that what dead code stripping is for?

If you are shipping a binary library, like the standard library, you don't know which parts of your library users are going to use when they link it, so you need to ship binaries with all of it to all Rust users.

The only one that can strip is the end user compiling a final binary, and the compiler often cannot do this for you because that requires whole program optimization and full LTO, which is super super slow.

Also, just because the final binary doesn't use a symbol, doesn't imply that the symbol isn't used. You can ship a library with a main function that can be run as an executable, or linked as a library. The linker doesn't know.

You would really need to go out of your way to strip your binary for your particular application. This is possible, and not that hard.

But the point remains: why should 99% of Rust users have to go through the trouble just so that those who need this don't have to write `cargo add unicode-segmentation` ?

Rust philosophy is "don't pay for what you don't use", so if your organization doesn't support third-party dependencies, they need a language with "batteries included", and not a language like Rust that comes without batteries by design.

Proposing to make Rust come with batteries included is proposing to change one of Rust's core values. Go ahead and write an RFC for that. I'll get my popcorn.

> The only one that can strip is the end user compiling a final binary, and the compiler often cannot do this for you because that requires whole program optimization and full LTO, which is super super slow.

Whole program dead code elimination doesn't have to be slow. Nim does that by default (it's not possible to turn it off since a couple releases actually) and it still compiles quite fast.

> Proposing to make Rust come with batteries included is proposing to change one of Rust's core values. Go ahead and write an RFC for that. I'll get my popcorn.

Again, my recommendation wasn't that, it was that the core team consider releasing "core" language features like Unicode support as first-party crates when they don't make sense as part of 'stdlib' not that I think they will. Feels weird to me that I have to rely on the goodwill of third parties to provide core language functionality like complete string handling.

You can definitely ship binary libraries and use only whatever is needed, and there is no need for LTO/WPO to achieve that.

In fact, LTO/WPO have nothing to do with the ability to link whatever is needed.

(as a maybe unneeded clarification, my previous comment agrees with you)

I agree that something like unicode-segmentation should not be in the bundled standard library. specifically because the std should be forward compatible.

In my opinion it is why foundational libraries should strive to be on Rust 2015 edition rather than the latest.

> Of course it does. Many software users are stuck on multiple-year-old toolchains for various reasons, yet these systems still need to be able to handle unicode properly.

So? Use the external library then. One thing does not preclude the other.

> No, they cannot. Many users are stuck in older windows versions, linux versions, LTS linux versions, etc. because of their organization or their clients requirements.

I work in such an organization and no, we cannot use third-party packages. The same way we cannot update our toolchain. So in most cases the point is moot.

> These updates would only apply to newer Rust toolchains, that many users cannot use. Unless you are suggesting the release of patch versions for the soon to be 100 old Rust toolchains in existence every time the unicode standard is updated.

You can provide standard Unicode handling that is good enough for 99% software out there. If you need to be on the bleeding edge, then use the bleeding edge library or rustc.

It is pretty simple, actually!

> So? Use the external library then. One thing does not preclude the other.

That's what everybody already does? You are proposing to, instead of doing that, move that library into the standard library where it cannot ever change.

> You can provide standard Unicode handling that is good enough for 99% software out there.

That's already in std? 99% of the code doesn't need to handle unicode grapheme clusters, because it doesn't deal with unicode at all.

You are suggesting moving something into standard that would make unicode software harder to update, and would make the standard library huge (>20mb larger) for all programs (the unicode tables take a lot of binary size), even those that don't use unicode, to try to solve a problem that does not exist.

> I work in such an organization and no, we cannot use third-party packages

If a Rust user cannot write `cargo add unicode-segmentation`, they have bigger problems than not being able to handle grapheme clusters. You can't run async code because you don't have an executor, you can't do http because the standard library doesn't support that, you can't solve partial differential equations, or do machine learning, or pretty much anything interesting with Rust.

That's bad for you, but the solution isn't to make Rust bad for everybody else instead.

If your organization doesn't let you use third-party packages, then write your own: that's what your organization wants you to do.

Some organizations want all code in CamelCase, they can't use the standard library at all. But the solution isn't to make Rust case insensitive, or to prove a 2nd standard library API for those organizations.

> That's already in std? 99% of the code doesn't need to handle unicode grapheme clusters, because it doesn't deal with unicode at all.

99% of the software does not use the entirety of the std. Something is good to be in the std if for that domain it solves the majority of problems, not if everyone uses it.

> You are suggesting moving something into standard that would make unicode software harder to update

It is equally hard to update.

When people say that std libraries are harder to update they refer to changes in interfaces, not incremental updates to tables etc.

> and would make the standard library huge (>20mb larger) for all programs (the unicode tables take a lot of binary size)

Including the tables in every executable even when not used is a broken implementation.

> If a Rust user cannot write `cargo add unicode-segmentation`, they have bigger problems than not being able to handle grapheme clusters.

It is not a "problem". In most commercial software, libraries and versions are vetted. Same applies for all languages. If something is in the std, then it is already in, that is why it is useful.

> That's bad for you, but the solution isn't to make Rust bad for everybody else instead.

I don't see why that makes Rust "bad". It sounds like the opposite to me!

> Some organizations want all code in CamelCase, they can't use the standard library at all.

You are going off-topic to support your point.

> I don't see why that makes Rust "bad".

And this is why people suggesting what you are suggesting never manage to achieve the change.

> 99% of the software does not use the entirety of the std

Most Rust software uses most of it.

> You are suggesting moving something into standard that would make unicode software harder to update

How do you update the unicode tables for those stuck with Rust 1.0 ? If you are going to make this claims, back them up.

> It is not a "problem". In most commercial software, libraries and versions are vetted. Same applies for all languages. If something is in the std, then it is already in, that is why it is useful.

So your organization does support third-party packages, you are just to lazy to ask for vetting ? That's not what you claimed above (you claimed that your organization does not support third-party packages at all).

The answer to this is simple, ask your organization to vet this library. If that's too complicated and takes too much effort, improve your organization's process.

Suggesting that only because you are too lazy to vet a library that library should be in standard is a laughable proposal. Think about the trade-offs, evaluate them, weight them, and if you still think doing so is worth it, write an RFC. The process for putting things into standard is open.

But if your only argument is "me,me,me,me" that's not going to go anywhere.