Hacker News new | ask | show | jobs
by dminuoso 1668 days ago
The primary problem is language/library designers/users believing there must be one true canonical meaning of the word „length“ like you just did, and that „length“ would be the best name for the given interface.

In database or more subtly various filesystems code the notion of bytes or codepoints might be more relevant.

By the way, what about ASCII control characters? Does carriage return have some intrinsic or clearly well defined notion of „length“ to you?

What about digraphs like ij in Dutch? Are they a singular grapheme cluster? Is this locale dependent? Do you have all scripts and cultures in mind?

1 comments

A CR is a space-type character. A string containing it has a length of 1.
Whitespace is the term.

And some clients expect that whitespace is not included in string length. "I asked to put 50 letters in this box, why can I only put 42?" would not be an unexpected complaint when working with clients. Even if you manage to convey that spaces are something funny called "characters", they might not understand that newlines are characters as well. Or emojis.

Credit card numbers come to mind, printed in letters they are often grouped into four number block separated by whitespace, e.g. "5432 7890 6543 4365" and now try to copy-paste this into a form field of "length" 16.

Ok, that's more of a usability issue and many front end developers seem to be rather disconnected from the real world. Phone number entry is an even worse case, but I digress ...

The UK Government (at least those based in GDS) has noted it (https://design-system.service.gov.uk/patterns/payment-card-d...), but some definitely are not good here. Also, hypens (or dashes) aren't popular in the US but (somewhat) popular in the UK!