Hacker News new | ask | show | jobs
by yarg 3262 days ago
I don't think that it's really fair to say that he's being disingenuous here, regardless of the underlying byte form the strings look indistinguisable and often users (devs) will expect them to function as such.

I think it would be less confusing to define .length as the number of characters and have an additional .size method returning the number of bytes (I'm assuming that's what .length returns, if not it's even more confusing).

Of course, that already wasn't done - meh.

1 comments

> have an additional .size method returning the number of bytes (I'm assuming that's what .length returns, if not it's even more confusing).

It's actually not the number of bytes, it's the number of...'codepoint pieces' is what it could be called I guess? Javascript's language level string implementation is something like UCS-2 with the addition of surrogate pairs being allowed, but counted as separate 'characters' for things like length and index access. It's some twisted middle ground between UCS-2 and UTF-16.

That seems deranged to me. Like a true length calculation, It still requires a complex (albeit cachable) calculation to resolve, but it fails to return the length of the string in terms of the number of characters as they would be natually presented.

I understand a need in some contexts to distinguish between a character and its subsequent modifiers - but I do not see such a context here.

Design by committee?