| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pegasuscollins 3156 days ago
	Have you seen the relevant reference in RFC3629, page 2? It's explicitly listed there as a feature: "The byte-value lexicographic sorting order of UTF-8 strings is the same as if ordered by character numbers" Agree that specifying the keys to be ordered >by unicode code points< instead of >lexicographically< would be less ambiguous though.

1 comments

seagreen 3156 days ago

I definitely meant ordering by Unicode code points. Someone very helpfully opened an issue and we're trying to figure out the right wording there: https://github.com/seagreen/Son/issues/13

link

Someone 3156 days ago

Looks good. I still foresee interoperability problems between implementations, though. It just is too easy to mix up the ‘sort by key’ and ‘escape various control characters’ steps (CR sorts before ascii characters, but “\n” sorts after it)

Even if the spec requires it, I fear implementations will also canonicalize strings differently, breaking sort order.

link