Hacker News new | ask | show | jobs
by sramsay64 454 days ago
In fairness there are also several ambiguities with JSON. How do you handle multiple copies of the same key? Does the order of keys have semantic meaning?

jq supports several pseudo-JSON formats that are quite useful like record separator separated JSON, newline separated JSON. These are obviously out of spec, but useful enough that I've used them and sometimes piped them into a .json file for storage.

Also, encoding things like IEEE NaN/Infinity, and raw byte arrays has to be in proprietary ways.

5 comments

JSON lines is not JSON It is built on top of it. .jsonl extension can be used to make it clear https://jsonlines.org/
Back in my day it was called NDJSON.

The industry is so chaotic now we keep giving the same patterns different names, adding to the chaos.

> How do you handle multiple copies of the same key

That’s unambiguously allowed by the JSON spec, because it’s just a grammar. The semantics are up to the implementation.

interestingly other people are answering the opposite in this thread.
They're wrong.

From ECMA-404[1] in section 6:

> The JSON syntax does not impose any restrictions on the strings used as names, does not require that name strings be unique, and does not assign any significance to the ordering of name/value pairs.

That IS unambiguous.

And for more justification:

> Meaningful data interchange requires agreement between a producer and consumer on the semantics attached to a particular use of the JSON syntax. What JSON does provide is the syntactic framework to which such semantics can be attached

> JSON is agnostic about the semantics of numbers. In any programming language, there can be a variety of number types of various capacities and complements, fixed or floating, binary or decimal.

> It is expected that other standards will refer to this one, strictly adhering to the JSON syntax, while imposing semantics interpretation and restrictions on various encoding details. Such standards may require specific behaviours. JSON itself specifies no behaviour.

It all makes sense when you understand JSON is just a specification for a grammar, not for behaviours.

[1]: https://ecma-international.org/wp-content/uploads/ECMA-404_2...

> and does not assign any significance to the ordering of name/value pairs.

I think this is outdated? I believe that the order is preserved when parsing into a JavaScript Object. (Yes, Objects have a well-defined key order. Please don't actually rely on this...)

In the JS spec, you'd be looking for 25.5.1

If I'm not mistaken, this is the primary point:

> Valid JSON text is a subset of the ECMAScript PrimaryExpression syntax. Step 2 verifies that jsonString conforms to that subset, and step 10 asserts that that parsing and evaluation returns a value of an appropriate type.

And in the algorithm

    c. Else,
      i. Let keys be ? EnumerableOwnProperties(val, KEY).
      ii. For each String P of keys, do
        1. Let newElement be ? InternalizeJSONProperty(val, P, reviver).
        2. If newElement is undefined, then
          a. Perform ? val.[[Delete]](P).
        3. Else,
          a. Perform ? CreateDataProperty(val, P, newElement).
If you theoretically (not practically) parse a JSON file into a normal JS AST then loop over it this way, because JS preserves key order, it seems like this would also wind up preserving key order. And because it would add those keys to the final JS object in that same order, the order would be preserved in the output.

> (Yes, Object's have a well-defined key order. Please don't actually rely on this...)

JS added this in 2009 (ES5) because browsers already did it and loads of code depended on it (accidentally or not).

There is theoretically a performance hit to using ordered hashtables. That doesn't seem like such a big deal with hidden classes except that `{a:1, b:2}` is a different inline cache entry than `{b:2, a:1}` which makes it easier to accidentally make your function polymorphic.

In any case, you are paying for it, you might as well use it if (IMO) it makes things easier. For example, `let copy = {...obj, updatedKey: 123}` is relying on the insertion order of `obj` to keep the same hidden class.

In JS maybe (I don't know tbh), but that's irrelevant to the JSON spec. Other implementations could make a different decision.
Ah, I thought the quote was from the JS spec. I didn't realize that ECMA published their own copy of the JSON spec.
Internet JSON (RRC 7493) forbids objects to have members with duplicate names.
As it says:

I-JSON (short for "Internet JSON") is a restricted profile of JSON designed to maximize interoperability and increase confidence that software can process it successfully with predictable results.

So it's not JSON, but a restricted version of it.

I wonder if use of these restrictions is popular. I had never heard of I-JSON.

I think it's rare for them to be explicilty stated, but common for them to be present in practice. I-JSON is just an explicit list of these common implicit limits. For any given tool/service that describes itself as accepting JSON I would expect I-JSON documents to be more likely to work as expected than non-I-JSON.
> How do you handle multiple copies of the same key? Does the order of keys have semantic meaning?

This is also an issue, due to the way that order of keys are working in JavaScript, too.

> record separator separated JSON, newline separated JSON.

There is also JSON with no separators, although that will not work very well if any of the top-level values are numbers.

> Also, encoding things like IEEE NaN/Infinity, and raw byte arrays has to be in proprietary ways.

Yes, as well as non-Unicode text (including (but not limited to) file names on some systems), and (depending on the implementation) 64-bit integers and big integers. Possibly also date/time.

I think DER avoids these problems. You can specify whether or not the order matters, you can store Unicode and non-Unicode text, NaN and Infinity, raw byte arrays, big integers, and date/time. (It avoids some other problems as well, including canonization (DER is already in canonical form) and other issues. Although, I have a variant of DER that avoids some of the excessive date/time types and adds a few additional types, but this does not affect the framing, which can still be parsed in the same way.)

A variant called "Multi-DER" could be made up, which is simply concatenating any number of DER files together. Converting Multi-DER to BER is easy just by adding a constant prefix and suffix. Converting Multi-DER to DER is almost as easy; you will need the length (in bytes) of the Multi-DER file and then add a prefix to specify the length. (In none of these cases does it require parsing or inspecting or modifying the data at all. However, converting the JSON variants into ordinary JSON does require inspecting the data in order to figure out where to add the commas.)

Plus the 64-bit integer problem, really 52-bit integers, due to JS not having integers.
JSON itself is not limited to neither 52 nor 64-bit integers.

    integer = -? (digit | onenine digit+)
    
https://json.org/
That’s a JavaScript problem, not JSON.
Most good parsers have an option to parse to integers or arbitrary precision decimals.
Agreed. Which means that Javascript does not have a good parser.
`JSON.parse` actually does give you that option via the `reviver` parameter, which gives you access to the original string of digits (to pass to `BigInt` or the number type of your choosing) – so per this conversation fits the "good parser" criteria.
To be specific (if anyone was curious), you can force BigInt with something like this:

    //MAX_SAFE_INTEGER is actually 9007199254740991 which is 16 digits
    //you can instead check if exactly 16 and compare size one string digit at a time if absolute precision is desired.
    const bigIntReviver = (key, value, context) => typeof value === 'number' && Math.floor(value) === value && context.source.length > 15 ? BigInt(context.source) : value
      

    const jsonWithBigInt = x => JSON.parse(x, bigIntReviver)
Generally, I'd rather throw if a number is unexpectedly too big otherwise you will mess up the types throughout the system (the field may not be monomorphic) and will outright fail if you try to use math functions not available to BigInts.
Sadly the reviver parameter is a new invention only recently available in FF and Node, not at all in Safari.

Naturally not that hard to write a custom JSON parser but the need itself is a bad thing.

No it's been there for ages. Finalized as part of ecmascript 5

What you are probably thinking of is the context parameter of the reviver callback. That is relatively recent and mostly a qol improvement

bigint exists