Hacker News new | ask | show | jobs
by zeveb 2928 days ago
I'll just note, once again, how verbose, unattractive & difficult to parse JSON is compared to S-expressions. Here are several of the examples in the spec in both formats, in order (I've rearranged the field so that required fields come first & optional fields come later).

2.1:

    {
      "mid": "ef5a7369-f0b9-4143-a49d-2b9c7ee51117",
      "rmid": "66c61afc-037b-4229-ace4-5ec4d788903e",
      "to": "uid:123",
      "from": "uid:56",
      "type": "dm",
      "version": "UMF/1.4.3",
      "priority": "10",
      "timestamp": "2013-09-29T10:40Z",
      "body": {
        "message": "How is it going?"
      }
    }

    (message
     ef5a7369-f0b9-4143-a49d-2b9c7ee51117
     (to uid:123)
     (from uid:56)
     (version SMF/1.4.3)
     (timestamp 2013-09-29T10:40Z)
     (rmid 66c61afc-037b-4229-ace4-5ec4d788903e)
     (type dm)
     (priority 10)
     (body
      (message "How is it going?")))
2.2.11:

    {
      "mid": "ef5a7369-f0b9-4143-a49d-2b9c7ee51117",
      "to": "uid:56",
      "from": "game:store",
      "version": "UMF/1.3",
      "timestamp": "2013-09-29T10:40Z",
      "body": {
        "type": "store:purchase",
        "itemID": "5x:winnings:multiplier",
        "expiration": "2014-02-10T10:40Z"
      }
    }

    (message
     ef5a7369-f0b9-4143-a49d-2b9c7ee51117
     (to uid:56)
     (from game:store)
     (version UMF/1.3)
     (timestamp 2013-09-29T10:40Z)
     (body (type store:purchase)
           (itemID "5x:winnings:multiplier")
           (expiration 2014-02-10T10:40Z)))
2.2.11.2

Note how JSON has to rely on metadata to indicate that a Base64 sequence, whereas it's natively supported by canonical S-expressions. Note also how the S-expression format natively supports types ('display hints') for its values.

    {
      "mid": "ef5a7369-f0b9-4143-a49d-2b9c7ee51117",
      "to": "uid:134",
      "from": "uid:56",
      "version": "UMF/1.3",
      "timestamp": "2013-09-29T10:40Z",
      "body": {
        "type": "private:message",
        "contentType": "text/plain",
        "base64": "SSBzZWUgeW91IHRvb2sgdGhlIHRyb3VibGUgdG8gZGVjb2RlIHRoaXMgbWVzc2FnZS4="
      }
    }

    (message
     ef5a7369-f0b9-4143-a49d-2b9c7ee51117
     (to uid:134)
     (from uid:56)
     (version SMF/1.3)
     (timestamp 2013-09-29T10:40Z)
     (body
      (type private:message)
      [text/plain]|SSBzZWUgeW91IHRvb2sgdGhlIHRyb3VibGUgdG8gZGVjb2RlIHRoaXMgbWVzc2FnZS4=|))
2.2.11.3

One might expect that S-expressions might shine when it comes to sending multiple items, and of course one would be correct.

Also note how the parallel structure of the message & message/body/message objects raises the question of whether the message/body/message schema should also be UMF.

    {
      "mid": "ef5a7369-f0b9-4143-a49d-2b9c7ee51117",
      "to": "uid:134",
      "from": "chat:room:14",
      "version": "UMF/1.3",
      "timestamp": "2013-09-29T10:40Z",
      "body": {
        "type": "chat:messages",
        "messages": [
          {
            "from": "moderator",
            "text": "Susan welcome to chat Nation NYC",
            "ts": "2013-09-29T10:34Z"
          },
          {
            "from": "uid:16",
            "text": "Rex, you are one lucky SOB!",
            "ts": "2013-09-29T10:30Z"
          },
          {
            "from": "uid:133",
            "text": "Rex you're going down this next round",
            "ts": "2013-09-29T10:31Z"
          }
        ]
      }
    }

    (message
     ef5a7369-f0b9-4143-a49d-2b9c7ee51117
     (to uid:134)
     (from chat:room:14)
     (version SMF/1.3)
     (timestamp 2013-09-29T10:40Z)
     (body
      (type chat:messages)
      (messages 
       (message
        (from moderator)
        (text "Susan welcome to chat Nation NYC")
        (ts 2013-09-29T10:34Z))
       (message
        (from uid:16)
        (text "Rex, you are one lucky SOB!")
        (ts 2013-09-29T10:30Z))
       (message
        (from uid:133)
        (text "Rex you're going down this next round")
        (ts 2013-09-29T10:31Z)))))
2.2.17

Note that there is a complex canonicalisation procedure for the JSON object, and that the sender must mutate the signed object; by contrast, the S-expression format is properly layered and doesn't mutate signed objects (which implies that it's possible to chain signatures cleanly).

    {
      "mid": "ef5a7369-f0b9-4143-a49d-2b9c7ee51117",
      "to": "uid:123",
      "from": "uid:56",
      "version": "UMF/1.4.6",
      "signature": "c0fa1bc00531bd78ef38c628449c5102aeabd49b5dc3a2a516ea6ea959d6658e",
      "body": {}
    }

    (signature
     (message
      ef5a7369-f0b9-4143-a49d-2b9c7ee51117
      (to uid:123)
      (from uid:56)
      (version SMF/1.4.6)
      (body))
     |c0fa1bc00531bd78ef38c628449c5102aeabd49b5dc3a2a516ea6ea959d6658e|)
It's not to late to switch away from JSON, it really isn't.
5 comments

Looking at the examples and thinking about how a C like program would process them, the S-Expressions look way more complex.

With JSON you know immediately what kind of datatype you are dealing with. You see a { you allocate an associative array, or if you see a [ you know you're about to get an ordered list. With S-Expressions it seems like you need to parse the entire thing and then figure out what kind of data structure you have.

In fact there doesn't appear to be any indicator at. Looking at 2.2.11.3 we see in the JSON that "messages" is an ordered list, but the content of the message is an associative array, but in the S-Expression they look identical.

So in C-like land you would end up with a big nested mess of arrays that are slow to parse and even harder to figure out the address of any object. There's a ton of friction that you don't have with JSON data.

When I need to parse or validate S-expressions, I just write the functions (here message, to, from, timestamp, etc.) so that eval()ing the S-expressions either validates it or returns whatever data structure I need.

So the easiest way would be to use or code a small lisp interpreter in C and eval the S-expression. For example, one could use Chicken Scheme to do so.

Or we could...not...evaluate random code potentially coming from hostile environments. That would also be cool and good.

And, yes, it's possible to have vulnerabilities in a JSON parser--but it is orders of magnitude easier to have them in an arbitrary language parser.

If you evaluate it in an environment where only the functions you choose are defined, the security risk is nil.

Validating a document is a complex, domain-dependant problem. It is far easier to create a secure Domain-Specific Language to handle this than to end up with an accidentally Turing complete abomination like XSLT: http://www.unidex.com/turing/utm.htm

>If you evaluate it in an environment where only the functions you choose are defined, the security risk is nil.

Oh. So all you have to do is write perfectly secure code and run it in a perfectly secure environment, and nothing bad can possibly happen.

Well shit, why didn't anyone else ever think of that?

> When I need to parse or validate S-expressions, I just write the functions (here message, to, from, timestamp, etc.) so that eval()ing the S-expressions either validates it or returns whatever data structure I need.

facepalm

As soon as you've decided to call an eval() function on potentially untrusted data, you've lost to an attacker.

I want to be a fan of csexps. I'm a big fan of SPKI/SDSI conceptually. Unfortunately I lack your enthusiasm for trying to evangelize them, and think JSON is probably here to stay.

That said, regarding JSON and the inclusion of self-describing encoding information for e.g. Base64, I created a microformat for that:

https://www.tjson.org/

This reminds me of Meteor's ejson but much less verbose. Very nice. Have you thought about adding a way to specify the type of object? Maybe something like `"field:<O(Post)>": ...`.
> It's not to late to switch away from JSON, it really isn't.

Yes, it is. People are already used that in dynamic languages (javascript, python, ruby), you can work with unknown structures in a performant way, and they will be mapped properly to the underlying data model.

They are not going to switch to something where you need to have a schema just to parse it properly.

> People are already used that in dynamic languages (javascript, python, ruby), you can work with unknown structures in a performant way, and they will be mapped properly to the underlying data model.

That's actually one of my concerns with JSON: it doesn't really convey the underlying data model. Sure, it can handle numbers — but it can't handle constraints like 'age must be positive.' Sure, it can handle strings — but there's no way in JSON to differentiate between Base64-encoded bytes & a normal string.

JSON lets one play with data, but one never knows if it's actually valid. It's dynamic typing, applied to data itself.

Exactly, it conveys underlying data model of dynamic languages, or to be specific, of a common subset of their data types.

As for data validity, this is completely separate question. I don't believe that validation should be a part of the language or data format -- my language lets me write 'age = "yellow"', and so should my data format.

How does it diferentiate booleans from strings? Does every major language have an implementation ready to use?
I've always liked s-expressions, unfortunately, it hasn't caught on in the circles I travel.