Hacker News new | ask | show | jobs
by laurent123456 3346 days ago
I'm curious what would be the use case for this? JSON is a human readable/writable format, however this kind of syntax is not anymore: "{"nested-array:A<A<s>>": [["Nested"], ["Array!"]]}"

So it feels more like a machine format, but in that case why not use a more efficient one, like a binary format?

3 comments

The point of this format to push data over the wire in a format that is both semantically richer and authenticatable using techniques like object-hash. This gets us to one true unambiguous representation of the data which you need for redactable signatures and rich credentials.
Hello, I created TJSON. The answer to your question can be found in the second sentence on the page:

> TJSON documents are amenable to "content-aware hashing" where different encodings of the same data (including both TJSON and binary formats like Protocol Buffers, MessagePack, BSON, etc) can share the same content hash and therefore the same cryptographic signature.

TJSON is designed to facilitate documents that retain the same content hash when transcoded to/from binary formats.

Could you help clarify this? My guess is that you're saying that you have some data type with (eg) strings and timestamps. When encoded to binary these are encoded differently, resulting in hash A. But if you roundtrip the data through JSON first both come back as strings, which when encoded to binary gives hash B. Am I on the right track?
If hashing is the main concern, wouldn't a "strict" spec for JSON do the job? eg. "all keys must be sorted", "all dates must be ISO-xxx", etc.?
You're describing canonicalization, which incorporates elements of the encoding format into the final hash, and therefore does not facilitate retaining the same content hash when transcoding to different formats.

Also, canonicalization is a bit of a mess. There are several incompatible canonicalization schemes for JSON, and even within a single one of those people have a difficult time implementing them correctly. See e.g. https://github.com/theupdateframework/tuf/issues/362

Have you seen my Son project? (https://github.com/seagreen/Son) I think I may have gotten JSON canonicalization right.

Also, I'm collecting a list of all subsets of JSON here if anyone knows of more: https://housejeffries.com/page/7

EDIT: Wow there's a lot of criticism in this thread. For the record I think TJSON is great.

So you are just creating another non-interoperable canonicalization format that takes out from JSON everything that makes it great: terseness.

https://xkcd.com/927/

You seem to have missed the point: it's not a canonicalization format. Content-aware hashing is a more powerful alternative to canonicalization which avoids many of the problems involved in designing a canonicalization scheme.
I'm not sure what you mean by "terseness". Do you mean the size of the spec? Because as serialization formats go, it is on the short side. However, typical JSON data is anything but terse; it is almost the most verbose serialization format in use, beaten out only by XML. Which I suppose could be where you're getting the idea that it is terse from, but in that case, it is "terse" only in the sense that it is the second worse of all the couple dozen common formats. It's kind of like how I've said a couple of times on HN that new compiled language designers should be grateful to C++ for setting the bar for compilation speed so low; it makes it very easy to put "compiles more quickly that C++!" into the initial elevator pitch. However, you are not "fast" merely by beating the slowest, nor are you "terse" merely by being slightly more efficient than the worst.
Can we at least agree that TJSON is more verbose than JSON?
It's very readable if you've used a typed language before. The <> brackets are like generics.
JSON is a subset of JavaScript. <> brackets are not in JavaScript.
That's not technically true (JSON is a subset of JavaScript) due to some extra characters allowed in JSON http://timelessrepo.com/json-isnt-a-javascript-subset.