Hacker News new | ask | show | jobs
by tobltobs 3347 days ago
Dear JS Hipsters, even if you all suffer from NIHS, could you please take look at XML before you invent another format. I am sure you will get used to those square brackets.
4 comments

Hi, TJSON creator here.

I have certainly studied XML and think XML Schema did fantastic work specifying datatypes:

https://www.w3.org/TR/xmlschema11-2/#built-in-datatypes

I briefly considered adopting this work wholesale:

https://github.com/tjson/tjson-spec/issues/37

If you'd like to see that happen, please make a note of it in the issue. Thanks!

Also note: I'm not a JS hipster, I'm part of the Rust Evangelism Strike Force.

I'm curious what drove decisions like no top-level arrays and strict conditions on set members. It's not mentioned explicitly in the spec but if the object syntax is the same as JSON, multiple field names would be allowed in that case.
TJSON requires arrays be homogenous, and this is presently accomplished by specifying the types of arrays up-front in object member names and rejecting the array if any of the contents don't adhere to the type signature.

With toplevel arrays, in absence of this type information being explicitly specified in an object, implementations would have to rely on detecting homogeneity at decode time.

This is certainly possible, and in fact the serialization logic does it. But it seems like a sharp edge to include in deserialization logic in a security-oriented format. The format aims to keep the deserialization logic free of any sort of "guesswork".

Ah of course, you need the names to specify types, that makes sense. And by the same no-ambiguities token, presumably repeated names in an object would be rejected by the parser.
> And by the same no-ambiguities token, presumably repeated names in an object would be rejected by the parser.

Correct:

https://github.com/tjson/tjson-spec/blob/master/draft-tjson-... https://www.tjson.org/spec/#rfc.section.3.8

First: XML is a markup language for encoding documents, JSON is a data-interchange language. Each can be twisted to do the job of the other, but they don't naturally do the same job.

Second, XML is extraordinarily complicated. Flipping around the XML 1.0 spec (https://www.w3.org/TR/xml/) isn't really encouraging me that all of this is there for a reason. I'd love to be proved wrong though!

In contrast, RFC 7159 is incredibly short and readable: https://tools.ietf.org/html/rfc7159. The TJSON spec isn't bad either: https://www.tjson.org/spec/. Even combining both the result is still far shorter and more clear than XML.

First, either XML or JSON are suitable for encoding documents or data interchange. Second, XML is also very _sophisticated_ and has an array of useful features that JSON developers are suddenly realizing to be pretty valuable sometimes. XSD is verbose, but it's rock solid. XPath and XInclude are also pretty awesome.
If you want something lightweight and readable JSON is fine. If you need a solution which covers 99% of all possible requirements there is XML. Everything in between will converge to the feature set of XML over time, if it lives long enough.
Even better, research things like ASN.1 and canonical S-expressions. Re. the latter, here are some examples:

    {"hello-world:s": "Hello, world!"} → (hello-world "Hello, world!")
    {"hello-base-sixteen:d16": "48656c6c6f2c20776f726c6421"} → (hello-base-sixteen #48656c6c6f2c20776f726c6421#)
    {"base-sixty-four-is-default:d": "SGVsbG8sIHdvcmxkIQ"} → (base-sixty-four |SGVsbG8sIHdvcmxkIQ|)
    {"hello-signed-int:i": "42"} → (some-int 42)
    Ø → (some-big-int [bigint]|GY0+kwq94p4QRs2j4rHisQLgEN3zsFSZNJrgK+ZFcV0s1ShyMkMFOHip0oRuG7v+TAC7qmDaYSojFbZjNV5dSA==|)
    {"hello-timestamp:t": "2016-10-02T07:31:51Z"}  → (hello-timestamp [timestamp]2016-10-02T07:31:51Z)
Seriously, this is IMHO so clearly good I'm surprised more folks don't agree.
TJSON is ultimately being written in service of a credential format I'm working on (however it will be using a compact binary format isomorphic with TJSON)

The main inspiration for this format is SPKI/SDSI, which was based on S-expressions. As beautiful as you think the S-expression version may be over the (T)JSON, I personally blame the use of S-expressions as one of many reasons SPKI/SDSI failed to gain more widespread traction, and personally think something like TJSON is a lot more likely to gain traction than the second coming of S-expressions. This is, of course, a debatable point, but you won't find me working on Sexp-based formats any time soon.

ASN.1 of course has a sordid history in the credential space as well, often reviled by security experts as the source of frequent vulnerabilities, particularly problematic encodings like BER. I will admit OER is nice, but nobody uses OER and the IETF prefers things be standardized in terms of DER.

"Research things", yes been there, done that.

This thing is designed for representing common data structures (array, set, date) in JSON. XML does not have that too. It's not validator such as XML Schema (JSON has JSON schema which is quite popular), it does not check anything.
If it's not a validator, then what's the value? Why would I use this?

I assume the TJSON libraries throw errors if invalid types or formats are provided --- which is good, but that makes this a validator. Developers have been representing non-standard formats in JSON for years.

Google's response to JSON's limitations was the Protocol Buffer [1], and as I understand it, it's used internally relatively extensively, but there hasn't been much adoption outside of Google. JSON is just the right mix of simple + robust for the majority of use cases.

[1] https://developers.google.com/protocol-buffers/

XML provisions an extensible way to markup your data though - TJSON is just a hack upon JSON...and it's full of potential problems that XML has solved years ago.