Hacker News new | ask | show | jobs
by chii 957 days ago
> If so, how do you deal with repeated keys in "hash tables"?

depending on the parser, behaviour might differ. But looking at https://stackoverflow.com/questions/21832701/does-json-synta... , it seems like the "best" option is to have 'last key wins' as the resolution.

This works fine under a SAX like interface in a streaming JSON parser - your 'event handler' code will execute for a given key, and a 2nd time for the duplicate.

2 comments

> This works fine

This is a very strange way of using the word "fine"... What if the value that lives in the key triggers some functionality in the application that should never happen due to the semantics you just botched by executing it?

Example:

    {
      "commands": {
        "bumblebee": "rm -rf /usr",
        "bumblebee": "echo 'I have done nothing wrong!'"
      }
    }
With the obvious way to interpret this...

So, you are saying that it's "fine" for an application to execute the first followed by second, even though the semantics of the above are that only the second one is the one that should have an effect?

Sorry, I have to disagree with your "works fine" assessment.

you're layering the application semantics into the transport format.

It's fine, in the sense that a JSON with duplicate keys is already invalid - but the parser might handle it, and i suggested a way (just from reading the stackoverflow answer).

It's the same "fine" that you get from undefined C compiler behaviour.

Why do you keep inventing stuff... No, JSON with duplicate keys is not invalid. The whole point of streaming is to be able to process data before it completely arrived. What "layering semantics" are you talking about?

This has no similarity with undefined behavior. This is documented and defined.

A JSON object with duplicate keys is explicitly defined by the spec as undefined behavior, and is left up to the individual implementation to decide what to do. It's neither valid nor invalid.
last key wins is terrible advice and has serious security implications.

see https://bishopfox.com/blog/json-interoperability-vulnerabili... or https://www.cvedetails.com/cve/CVE-2017-12635/ for concrete examples where this treatment causes security issues.

the https://datatracker.ietf.org/doc/html/rfc7493 defines a more strict format where duplicate keys are not allowed.

Last key wins is the most common behavior among widely-used implementations. It should be assumed as the default.