| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by atoav 560 days ago
	Sure, go ahead. Write the PR and make sure to test against all other things used in production. Let's talk again in 30 years when you're done.

1 comments

jerf 560 days ago

Oh, it's been closer to 20 years for the rest of the world to catch up to Unicode than 30. We aren't at "perfect" now but we're certainly down to the trickier corner cases that are difficult to even see how you solve the problems at all, let alone code the solutions, and that's just reality's ugly nose sticking in to our pristine world of numbers.

But there really isn't any other solution. Yes, there will be an uncomfortable transition. Yes, it blows. But there isn't any other solution that is going to work other than deal with it and take the hits as they come. The software needs to be updated. The presumption that usernames are from some 7-bit ASCII subset is simply unreasonable. We'll be chasing bugs with these features for years. But that's not some sort of optional aspect that we can somehow work around. It's just what is coming down the pike. Better to grasp the nettle firmly [1] than shy away from it.

At least this transition can learn a lot from previous transitions, e.g., I would mandate something like NFKC normalization applied at the operating system level on the way in for API calls: https://en.wikipedia.org/wiki/Unicode_equivalence Unicode case folding decisions can also be made at that point. The point here not being these specific suggestions per se, but that previous efforts have already created a world where I can reference these problems and solutions with specific existing terminology and standards, rather than being the bleeding-edge code that is figuring this all out for the first time.

[1]: https://www.phrases.org.uk/meanings/grasp-the-nettle.html

link

atoav 560 days ago

Don't get me wrong, I think using UTF-8 everywhere is how things should be.

But this is not a "let's just" or "why don't we" type of endeavor. This is a major undertaking, and as such people are needed who (A) think it is worth the effort and (B) are willing to follow through with all the consequences.

Open Source software lives from contributions and if you're not willing to do it, why should others spend years of their lives for it?

In the end this is a question of: are the benefits worth the effort? What do we win? Where do things get simpler? Where more complicated? How do you pull it off if half the distributions use UTF8 and the other half uses the legach way? How would tooling deal with this split? etc.

link

atoav 559 days ago

To add a little bit of context:

You know what I think would be way worse than todays reduced characterset usernames with some special rules or "just" using utf-8 for them?

Both. Imagine a world where some usernames are UTF-8 some are not and it is hard to figure out which is which. That would be worse than just leaving things as they are.

Avoiding that situation makes pulling the whole thing off even harder, since there needs to be a high amount of coordination between many projects, distros etc.

link

gray_-_wolf 560 days ago

> Unicode case folding decisions can also be made at that point

Ok I will bite. How do you indent to do case folding without knowing the language the string is in? Will every filename or whatever also have its language as part of the string? I am not sure what the plan is there.

link