| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pegasuscollins 3156 days ago
	What do you mean by ordering? This is easy to miss in the linked "spec" (I did too at first) but there is a sentence: "Object members must be sorted by ascending lexicographic order of their keys."

1 comments

Someone 3156 days ago

They need to tighten that a bit, though. “Lexicographic” isn’t clear enough even ignoring locale dependencies.

Looking at the reference implementation, it contains

  encode = encodeUtf8 . generate

So, the sorting is done before conversion to UTF-8.

Relevant line, I think is

  genObject hm = "{" <> foldl' go mempty (sortOn fst (HM.toList hm)) <> "}"

So, it uses Data.String’s sort order, which seems to be to lexicographic by Unicode code point (https://stackoverflow.com/a/3126287)

⇒ implementations cannot sort the UTF-8 byte st sequences lexicographically. I think that’s a bad choice (if it was a choice and not an oversight)

I’ve never written a single line of Haskell, so corrections welcome.

link

pegasuscollins 3156 days ago

Have you seen the relevant reference in RFC3629, page 2? It's explicitly listed there as a feature: "The byte-value lexicographic sorting order of UTF-8 strings is the same as if ordered by character numbers"

Agree that specifying the keys to be ordered >by unicode code points< instead of >lexicographically< would be less ambiguous though.

link

seagreen 3156 days ago

I definitely meant ordering by Unicode code points. Someone very helpfully opened an issue and we're trying to figure out the right wording there: https://github.com/seagreen/Son/issues/13

link

Someone 3156 days ago

Looks good. I still foresee interoperability problems between implementations, though. It just is too easy to mix up the ‘sort by key’ and ‘escape various control characters’ steps (CR sorts before ascii characters, but “\n” sorts after it)

Even if the spec requires it, I fear implementations will also canonicalize strings differently, breaking sort order.

link