Hacker News new | ask | show | jobs
by ComputerGuru 3026 days ago
That's not the problem. There absolutely should be two different types (and I don't even care that the names are so poor). But half the rust APIs take one type and the other take another (even when the string isn't manipulated or stored in any way, shape, or form). Some interfaces are only implemented for string and others only for &str.

Deciding betweene two distinct types &str and &string (not mut &string) for your function's interface is nonsense. It makes no sense to have to _decide_ between which two views of a string that you can read-but-not-manipulate you want to use, and it makes zero sense that they can't unify the types with some simple compiler magic. A constant reference to a string should automatically decompose into a view of that string and that should be that. [edit: as in that view shouldn't be a separate type]

Additionally, that dereferencing a string returns a pointer... that makes no sense. That's the kind of nonsense we ran away from in the C++ world.

strings are the reason I regret not adopting rust back when as a user of a pre-1.0 language I could have joined in efforts to lobby against this insanity.

---

As a sidenote, string_view is so late in coming to the c++ world that it's not even funny. Having a separate std::string with an "implementation-defined" in-memory representation in a world of c strings (char *) is inane beyond belief. (Yes, nulls in strings would still be a problem. But why do your strings have nulls in the first place? That data should probably be a vector of strings or a [vector|array] of uint8_t (even if just typedef'd to unsigned char) and C++ strings should have been mandated utf8, contiguous, and null-terminated. You should be able to compose a zero-copy, read-only, non-owned string from a character array and decompose automatically to it. And don't get me started on the fact that C++ doesn't have sprintf because of the obsession with sticking to the overly verbose and way too complicated streaming operators. Developers end up using c strings with sprintf to format text and then copy it back to a std::string just to work around that stupidity.

4 comments

Anything implemented for &str is automatically implemented for String, because String implements Deref<Target=str>.

Most useful "String" methods are actually &str methods that you get access to through that deref trait.

Dereferencing a String doesn't return a raw pointer, I'm not sure where you got that idea.

Yes, anything implemented for &str is automatically implemented for String... except some API are stupidly implemented for &string instead. And you can't pattern match strings properly (think some `for in`) without first explicitly converting to &str.

Dereferencing a string does not return a raw pointer, that was exaggeration on my behalf. But a string is a container, so *string returns.... &str? But string.deref() returns str?

Don't get me wrong, I'm fully invested in the language [0], [1], [2], but it's got a lot of warts that could have been avoided by thinking bigger picture. So many APIs are restricted by thinking easy instead of big pre-1.0. Like str being hardcoded into APIs that should have been generic (FromStr vs From<&str>, .parse() vs .into()), shipping 1.0 without async/await, and the whole mess with strings.

0: https://github.com/rust-lang/rust/issues?utf8=%E2%9C%93&q=is...

1: https://github.com/rust-lang/rfcs/issues?utf8=%E2%9C%93&q=is...

2: https://crates.io/search?q=neosmart

> But a string is a container

In the same way unique_ptr is a container.

> so string.deref() returns.... &str?

Yes? &str::deref() also returns &str, Vec::deref() returns &[], Box<T>::deref() returns &T.

That's literally how Deref is defined, Deref<Target=T>::deref() returns &T.

*String returns str.

> (FromStr vs From<&str>, .parse() vs .into())

These are not equivalent. From/Into are non-failing conversions, FromStr can fail.

What you're looking for is TryFrom/TryInto which are still not done 2 years into the RFC: https://github.com/sfackler/rfcs/blob/try-from/text/0000-try...

> * String returns str.

Typed that out too fast, yes, that's my problem. * String is one thing but String.deref() is another. But * is the dereference operator. Operator overloading ftw ;)

> What you're looking for is TryFrom/TryInto which are still not done 2 years into the RFC: https://github.com/sfackler/rfcs/blob/try-from/text/0000-try....

Sorry, yes, I actually opened an issue with my suggestions regarding that one with particular focus on the fallible vs infallible nature: https://github.com/rust-lang/rfcs/issues/2143

> Typed that out too fast, yes, that's my problem. * String is one thing but String.deref() is another. But * is the dereference operator.

They're the same thing, Deref::deref() is just the operation which underlies the dereferencing operator.

Either way I don't see what's problematic about a string buffer deref'ing to a string.

> except some API are stupidly implemented for &string instead.

Which ones are those? I can't think of any off the top of my head, though brains are fallible!

Hi Steve! Sorry, I didn't mean to imply in the core API. I don't think (though I too could be wrong) that &string is used anywhere that AsRef<OsStr> isn't also available.
Fundamentally, String and &str communicate very different things. String has ownership, &str does not. This is not a reconcilable difference.

The closest you could get is automatically allocating &strs into Strings, but then you're introducing silent allocation, which has a host of its own problems.

> A constant reference to a string should automatically decompose into a view of that string and that should be that.

This is exactly what happens, through Deref coercions.

I mean that &string should not be a type distinct from &str.

I absolutely don't think &str should silently allocate for converting to String. But I think &str should zero-allocation convert to &string (and bypass dealloc, too).

Sorry for misunderstanding you!

That would introduce special cases into the type system, adding complexity. String is a library type, so you can take a reference to it like any other type. We'd have to move String into the language for that to happen, and currently, the language itself knows nothing about allocation, so then we'd have to put allocation into the language, which would then change our story on embedded significantly... and then it'd be one of the only types that you couldn't take a reference to for some reason, which would affect generic APIs, etc...

Well, that's actually the root problem which I avoided discussing until now: why does String have a str special case but no other data type does? Same with Path and PathBuf. Why does there need to be a distinct data type for a non-owned view into an object? Why is that not just part of the language in the first place? C++ needs string vs string_view because it has no borrow checker, but rust could (theoretically) implement this without the need for two different types.
> why does String have a str special case but no other data type does?

Vec's "special case" is [].

And it's not that String "has" a special case, it's that str is a special fundamental case of the language in the same way i8 or () is and String exists to make it easier to work with (otherwise you'd have to deal with Box<str> and efficiently working with that would require going back and forth to Vec, except you'd have removed Vec since it'd be a special case of [] and where do you store capacities at this point?)

> C++ needs string vs string_view because it has no borrow checker, but rust could (theoretically) implement this without the need for two different types.

C++ has char* which conceptually underlies both string and string_view. Rust shoves char* in the unsafe corner but need something to replace it, something which not only means "a bag of bytes" ([u8] does that just fine) but text as in "actual proper utf8-encoded unicode text". That's str.

So, we almost made str a library type, but there were downsides and not a lot of upsides. https://github.com/rust-lang/rust/pull/19612

> no other data type does?

You could argue that slices are a primitive type to arrays (also a primitive type) and vectors (a library type).

> Why does there need to be a distinct data type for a non-owned view into an object?

The difference between owned and non-owned types is fundamental; how would you propose distinguishing them if not as part of the type? Both concepts are part of the language, but like any language, you can use its fundamental bits to build better abstractions.

> That's not the problem. There absolutely should be two different types (and I don't even care that the names are so poor). But half the rust APIs take one type and the other take another (even when the string isn't manipulated or stored in any way, shape, or form). Some interfaces are only implemented for string and others only for &str.

Things which are implemented for Strings are those which actually require it. String derefs to &str so you can call any &str method on String, any trait implemented by &str is basically implemented by String, and if you need to pass &str and have a String you just &v.

> Deciding between two distinct types &str and &string (not mut &string) for your function's interface is nonsense.

There are very very few reasons to ever ask for an &String but then what, should the language somehow forbid regular references to a perfectly standard type?

> It makes no sense to have to _decide_ between which two views of a string that you can read-but-not-manipulate you want to use

&String is not a view of anything, it's a regular reference to a string living somewhere in memory.

> A constant reference to a string should automatically decompose into a view of that string and that should be that.

It does that if a function asks for an &str and you &s where s:String. All Rust doesn't do is remove &String from the language, in the same way string is a valid C++ construct.

> Additionally, that dereferencing a string returns a pointer... that makes no sense.

Dereferencing a String doesn't return a pointer. String is* a pointer, so you can deref' it to get the str behind it (which is not a pointer, it's the actual unsized string data).

> Deciding betweene two distinct types &str and &string (not mut &string) for your function's interface is nonsense.

There's really no decision to be made.If you don't want to mutate the argument, use &str, you can still call the function with an &String. If you need mutation, take ownership with String or a mutable ref with &mut String.

> Additionally, that dereferencing a string returns a pointer...

You can't deref a String, you can only deref a reference (&String), not an object. &String derefs to &str. String.str_method() where str_method takes &str works because it'll auto-ref String -> &String, and then deref to &str.

IMO you're making a mountain out of a molehill, it makes a lot of sense once you use it for any time at all.

> You can't deref a String

You can absolutely deref a String. Not necessarily usefully (or to the satisfaction of the compiler) as it yields an unsized `str` (exactly the same as deref'ing an &str) but you can certainly do it.

    let a = "foo".to_owned();
    let b = &*a;
works perfectly fine.

Incidentally you can also deref' a Vec, that yields a slice (actual sequence, not the commonly seen &[])