Hacker News new | ask | show | jobs
by tyingq 1325 days ago
>If you have some function that accepts it, blindly casts it to UTF-8

Unfortunately, if you interact with services you didn't write, you're usually back to getting "strings" of unknown encoding, and typically requirements that force some blind or semi-blind guessing.

2 comments

Blind guessing is not related to the type system. Nobody has claimed type systems can solve that. What they can do is force you to guess, and make it clear where that is occurring.

This, again, goes back to a very broken understanding of types systems that I often see, and once held myself. The claim of type systems is not that they magically go out into the world and fix the external world to be well-typed; the claim is that it forces your code to deal with the conversion of the external world into a clean internal representation, and presumably, to have a clean error pathway when that fails. Dynamically-typed code will let you float along much more easily. Statically-typed code can still be written that way, but at least then it's poor statically-typed code. In some circles that sort of broken dynamic code is essentially idiomatic. (Though that is fading away as every year more programmers learn how bad an idea that is.)

I agree with that if you qualify it with "sometimes". Strong types can force you to guess, sometimes. Other times, the data fits the type but isn't the type.
If the language explicitly says how strings are defined, libraries that go "Eh, I'll just shove nonsense bytes in this data structure and claim that's a string" are broken by definition.

That's just as true in Java as in Rust. The problem is languages like C++ or D which just don't care and have a "string" type that might just be some bytes.

I don't mean libraries, I mean external services. Ambiguous strings are everywhere.
The libraries in question are the ones consuming the output of those external services. If an external service sends data that does not map to the programming language's string type, then the string type will fail to be created from that invalid input, and the library was wrong to have tried.
The external services are not always either explicit, or compliant about the content of said string. Follow this comment chain up, and you'll see mention of blinding casting to utf-8. The point I was making is that you don't always know what the encoding is.
If the service doesn't send you back a string it doesn't send you back a string. End of story. You're continuously trying to complicate the issue by insisting that they're sending strings and that the client needs to guess their encoding. They are not and it does not.
Encoding was one example. But "string" is language dependent aside from encoding. Strong typing doesn't solve the inherent issue.