|
|
|
|
|
by BuckRogers
3402 days ago
|
|
I understand your position. This is a longstanding debate within the PL community, as you know. I've considered that stance before but I have to say thanks for stating it so well because it's given me pause. I can't say I agree but you're not "wrong". I agree with the fail-fast concept but disagree that's the place to do it. A strict fail-fast diehard probably shouldn't even be using Python, IMO. I don't have anything else useful to add because this is simply a engineering design philosophy difference. We both think our own conclusions are better of course but I appreciate you detailing yours. Good chat, upvoted. |
|
And yes, you're right that Python is probably not a good example of ideologically pure fast-fail in general, simply because of its dynamic typing - so that part of argument is not really strong in that particular context.
(Side note: don't you find it amusing that between the two languages that we're discussing here, the one that is statically typed - Go - chose to be more relaxed about its strings in the type system, while the one that is dynamically typed - Python - chose to be more strict?)
The key takeaway, I think, is that there is no general consensus on this. As you say, "this is a longstanding debate within the PL community" (and not just the topics we touched upon, but the more broad design of strings - there's also the Ruby take on it, for example, where string is bytes + encoding).
My broader take, encoding aside, is that strings are actually not sufficiently high-level in most modern languages (including Python). I really want to get away from the notion of string as a container of anything, even Unicode code points, and instead treat it as an opaque representation of text that has various views exposing things like code points, glyphs, bytes in various encodings etc. But none of those things should be promoted as the primary view of what a string is - and, consequently, you shouldn't be able to write things like `s[i]` or `for ch in s`.
The reason is that I find that the ability to index either bytes or codepoints, while useful, has this unfortunate trait that it often gives right results on limited inputs (e.g. ASCII only, or UCS2 only, or no combining characters) while being wrong in general. When it's accessible via shorthand "default" syntax that doesn't make it explicit, people use it because that's what they notice first, and without thinking about what their mode of access implies. Then they throw inputs that are common for them (e.g. ASCII for Americans, Latin-1 for Western Europeans), observe that it works correctly, and conclude that it's correct (even though it's not).
If they are, instead, forced to explicitly spell out access mode - like `s.get_code_point(i)` and `for ch in s.code_points()` - they have to at least stop and think what a "code point" even is, how it's different from various other explicit options that they have there (like `s.get_glyph(i)`), and which one is more suitable to the task at hand.
And if we assume that all strings are sequences of bytes in UTF-8, the same point would also apply to those bytes - i.e. I'd still expect `s[i]` to not work, and having to write something like `s.get_byte(i)` or `for b in s.bytes()` - for all the same reasons.