Hacker News new | ask | show | jobs
by pcwalton 3620 days ago
I don't think most of these are applicable to Rust.

> - Conversions between our 5(!) string types are very common.

> - Standard library I/O modules cannot use new, de-facto standard string types (i.e. `Text` and `ByteString`) defined outside it because of dependency cycle.

We have one string type defined in std, and nobody is defining new ones (modulo special cases for legacy encodings which would not be worth polluting the default string type with).

> - Standard library cannot use containers, other than lists, for the same reason.

> - No standard traits for containers, like maps and sets, as those are defined outside the standard library. Result is that code is written against one concrete implementation.

Hash maps and trees are in the standard library already. Everyone uses them.

> - Newtype wrapping to avoid orphan instances. Having traits defined in packages other than the standard library makes it harder to write non-orphan instances.

This is true, but this hasn't been much of a problem in Rust thus far.

> - It's too difficult to make larger changes as we cannot atomically update all the packages at once. Thus such changes don't happen.

That only matters if you're breaking public APIs, right? That seems orthogonal to the small-versus-large-standard-library debate. Even if you have a large standard library, if you promised it's stable you still can't break APIs.

6 comments

But if you have a large standard library and want to break the API, you can.

If you have 100 different libs that are basically "standard" (who doesn't have `mtl` in their applications at this point), now you have to coordinate 100 different library updates roughly at the same time. If you forget even one of them, then you've broken everything.

I think the argument for a large Prelude/standard lib is similar to Google's "single repo" argument: Easy to catch usages and fix them all at once. Plus you're making the language more useful out of the box. People coming from python can understand this feeling of opening a python shell and being productive super quickly form the get-go.

Arguments for small std lib exist, of course. But Giant standard libraries are more useful than not.

EDIT: I think the failure of the Haskell Platform has a lot more to do with how Haskell deals with dependencies, and the difficulties it entails, than with the "batteries included" approach itself.

Standard libraries - types, in particular - are the lingua franca between unrelated libraries. The more that's in your standard library, the easier it is to integrate different libraries.

The higher level the library (e.g. containing content specific to an application domain), the more magical-seeming libraries can be added to the ecosystem. The counter-risk is the standard library growing in undesirable directions that you can never change because you can't remove stuff.

The interstitial glue that lets third party libraries integrate with one another and be usable by your app: that's the single biggest reason for having a bigger standard library than a smaller one. It has very little to do with including the batteries in the box.

If you think it has something to do with including the batteries in the box, you'll be lured into the trap of making it easy to fetch the batteries from across the internet (that's almost the same, right?). The trouble is, the internet has 100 different batteries to choose from, and not only have you offloaded the choice onto the user, but the batteries use mutually incompatible terminals and you have to jerry-rig interfaces between them. Let a thousand flowers bloom, say some people: trouble is, waiting for the biggest flower can take years, and people pick different ones in the early days. A bad choice is better than indecision.

Low effort updates are even less what large standard libraries are about. Large standard libraries are much harder to update, not easier: there's much more surface area, so it's far easier to break an application - and since every application uses the standard library, you could potentially break them all. Easier versioning and updates are a strong argument for extracting out things into third-party libraries.

But even then, languages that have great, thriving easy to use dependency systems and package managers with small standard libraries still run into problems

see: javascript

The issue with comments written this way is that there are no details to support the claim.

Writing "see: JavaScript" doesn't really help without context. Without context, one does no know if you meant "JavaScript in browser" or "JavaScript via Node.JS" or "I simply don't like npm".

I'm not claiming there aren't any problems; however, "problems" are situational and one person's "problem" is another person's meh.

I just think it's irresponsible to not provide detail when making such claims.

> But if you have a large standard library and want to break the API, you can.

We have a policy of no breakage for stable libraries post 1.0 (as does Python, and Go, etc.). So no, we can't.

The size of the standard library has nothing to do with it.

I'm curious about which language features or tooling do other languages have that make them better at dealing with dependencies than Haskell?
> We have one string type defined in std

The standard library also includes Path/PathBuf and OsStr/OsString. And third-party libraries also use [u8] for bytestrings.

It'd be nice to improve handling for user-supplied text where you can't assume UTF-8. For instance, git2-rs provides the contents of diffs as [u8], because it can't assume the diffed files use UTF-8. That led to this commit today: https://github.com/ogham/rust-ansi-term/pull/19/commits/a0da...

That felt like a lot of boilerplate to abstract between str and [u8]. Is there a better way to solve that problem?

(As much as I'd love to just say "use UTF-8", that would break on many git repositories, including git.git and linux.git.)

> The standard library also includes Path/PathBuf and OsStr/OsString.

Right. And you want people to explicitly convert between them.

Having to convert between string types isn't a problem. String encoding is hard, and you're going to have to pay that cost somehow.

Having to convert isn't a problem. Having to write some algorithms multiple times for different string types is a problem.
Fair. Most of these algorithms could be written generically I guess.
> That felt like a lot of boilerplate to abstract between str and [u8]. Is there a better way to solve that problem?

This doesn't have anything to do with large vs. small standard libraries, because all of these string types are defined in libstd.

libstd defines varying amounts of string manipulation and abstraction for those string types, though.

I'd love to see additional support for handling bytestrings in libstd, to make it easier to write code that handles both &str and &[u8].

I think rust needs to slow down in this regard. I have been with Python since 1999 and the stdlib has held it back, I have also used Scala and Haskell and have witness the mess that platform libs on each have caused.

What Rust has right now is pretty amazing. What needs to happen is a way for devs to easily break the dependency cycle and include multiple versions of the same crate. Something that has plagued Haskell. I dunno what the answer is, trait only crates, struct only crates?

If people want to 'curate' (shop) a set of packages, they can make a meta package that exports its deps.

There is literally no reason to ship libs with the compiler aside from the basic verbs and nouns.

With verioned and properly name-spaced imports, one could use different curated libs.

If you can, could you elaborate more on python's stdlib holding it back? I think batteries-included experience is one of the reasons why so many people (including myself) use python.

It's also one of the features I sorely miss when using Rust. Luckily, Rust's stdlib is starting to tend towards being more practical with recent additions like system time.

The 'std lib is where libraries go to die' was invented by Python. The libs are shallow, don't break backwards compat and provide a substandard experience. Things that continue to improve provide an out of tree alternative package name. Python codebases that are resilient don't use much of "core", arrow for time, requests for http, simplejson, etc. Using core is an antipattern that will get you stuck on a version of the language which is ridiculous.

Linking the language and the libraries together is a mistake.

I disagree.

In the enterprise space it is quite common that we only get to use what it is in the computer and access to anything else is strictly controlled by IT.

So if it isn't in the standard library or some internal library mirror, we don't get to use it, as simple as that.

I think it would be terrible for Rust design/evolution/policy be constrained with that kind of enterprise badness that basically bans crates.io, and crates.io is an awesomeaspect of the Rust ecosystem.
I can tell it is lots of "fun" when you can only use a Maven mirror, with approved jars.

To get a jar into that mirror, a request needs to be sent to the legal team describing the license and business case use, after approval the IT team will add the said jar to the mirror.

The same applies to version upgrades of already approved jars.

This is a typical scenario I had already in a couple of projects.

It's not a constraint as much as it is a consideration imo.
So maybe there's value in shipping a "standard bundle" that includes popular libraries or some such. But it's not worth distorting the whole language design to accommodate bad policies.
Might be, however those bad policies are quite standard in big corporations.
I see where you're coming from, but I feel like it would be a mistake to expect the language or std lib to try to solve problems that are effectively organizational/cultural issues.
Plus, python's std lib is a mumbo jumbo of all sorts, there is no API coherence.
That's a failure at the moment of inclusion. I'm guessing it was done for convenience and to increase adoption (getting decent libraries in the standard library faster).
Just as a data point... I like and heavily use the core libs... And not once i used arrow, request or simplejson, while knowing them, because i didn't feel the needs
Then you most likely have various security or logic problems in your application unfortunately.
Arrow seems particularly useless as it just wraps stdlib datetime and its awful 10 byte size rather than moving to an 8 byte representation like np.datetime64 uses.
Just because you haven't found a use for it doesn't mean it's useless.

The stdlib datetime class is terrible and desperately needs to be wrapped. Arrow is a good wrapper. I don't know what you're on about with counting bytes.

But isn't requests built off of urllib?

The thing I like about python is it gives tools for library writers to build things without going too low level.

Application writers will always write with better libs, but don't have to worry about third party lib compatiblity on platforms because of the stdlib serving as a virtual machine (most of the time)

requests is built on urllib3, and it includes its own version of urllib3 (to avoid dependency problems).

If I'm not mistaken, the stdlib contains urllib and urlib2, but not urllib3.

The fact that there 3 "urllib" packages show that the Python way is not so good.

Many libraries in the stdlib have much better alternatives, because libraries with their own release cycle can evolve much quicker. But people get stuck on the "standard" version because it's what's in the stdlib. Worse, people write for compatibility with whatever was in stdlib 2.4 because that's what RHEL6 ships.
Rust will already allow you to have multiple versions of transitive dependencies.
This needs to be screamed from the hill tops!
To be honest, this is the kind of thing that would be great to highlight immediately on https://github.com/rust-lang/cargo or http://doc.crates.io/guide.html.

I go to those pages and I am given 0 information on how the thing actually works.

Cargo's docs need a bunch of work, it's true. I have so much to do :(
Which I guess is normal since it does not create any dependency cycle. A new version might as well be thought as a completely different package (of perhaps similar functionality).
One of the things I love above all about Python and Ruby are the kitchen-sink standard libraries. The node ecosystem is deeply frustrating in this respect.
It has been a while since I did anything with Python, but I did like its standard library. It was reasonably comprehensive without feeling bloated, and the documentation was pretty good (mostly).

Having a good standard library also makes deployment easier. (In Go, OTOH, I tend to care less, even though its standard library is quite good, because thanks to static linking, deployment is always easy, no matter how many third-party libraries I use.)

> We have one string type defined in std, and nobody is defining new ones (modulo special cases for legacy encodings which would not be worth polluting the default string type with).

There's also `inlinable_string`, `string_cache`, `tendril`, `intern` if you need inlining for performance.

The bigger problem is with other things like 2D/3D points which can be (f32, f32), [f32; 2] or a custom struct.

I really really would advise having a word with Snoyberg about this. The Haskell Platform has been a pretty deadly experience. It's also ridiculously beginner-hostile (sounds like it won't be, is in practice).
Hash maps and trees: fine. What about database interfaces (e.g. a JDBC/ODBC/whatever equivalent)? What about HTTP servers - even the minimal declaration for what a synchronous request handler might look like? How about threadpooling - if you have multiple libraries that have parallelizable work, you certainly don't want multiple threadpools each thinking they have X many cores to work with, and you don't want the user to have to partition these things either - that's not a happy problem.

All things you can delegate to third parties, but not without lots of cross-talk and confusion until things settle down to winners and losers, which may be a long time in the future. Indecision can be costly.

Consider standard library profiles, with progressively higher levels of abstraction supported. It's the right decision for creating a good ecosystem. C and C++ took decades to build consensus on the more complicated libraries, and C++ eventually grew a pseudo-standard library in the form of boost to centralize efforts, simply because it is more efficient that way.