Hacker News new | ask | show | jobs
by pegasuscollins 3109 days ago
What do you mean by ordering? This is easy to miss in the linked "spec" (I did too at first) but there is a sentence:

"Object members must be sorted by ascending lexicographic order of their keys."

1 comments

They need to tighten that a bit, though. “Lexicographic” isn’t clear enough even ignoring locale dependencies.

Looking at the reference implementation, it contains

  encode = encodeUtf8 . generate
So, the sorting is done before conversion to UTF-8.

Relevant line, I think is

  genObject hm = "{" <> foldl' go mempty (sortOn fst (HM.toList hm)) <> "}"
So, it uses Data.String’s sort order, which seems to be to lexicographic by Unicode code point (https://stackoverflow.com/a/3126287)

⇒ implementations cannot sort the UTF-8 byte st sequences lexicographically. I think that’s a bad choice (if it was a choice and not an oversight)

I’ve never written a single line of Haskell, so corrections welcome.

Have you seen the relevant reference in RFC3629, page 2? It's explicitly listed there as a feature: "The byte-value lexicographic sorting order of UTF-8 strings is the same as if ordered by character numbers"

Agree that specifying the keys to be ordered >by unicode code points< instead of >lexicographically< would be less ambiguous though.

I definitely meant ordering by Unicode code points. Someone very helpfully opened an issue and we're trying to figure out the right wording there: https://github.com/seagreen/Son/issues/13
Looks good. I still foresee interoperability problems between implementations, though. It just is too easy to mix up the ‘sort by key’ and ‘escape various control characters’ steps (CR sorts before ascii characters, but “\n” sorts after it)

Even if the spec requires it, I fear implementations will also canonicalize strings differently, breaking sort order.