| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tripple6 597 days ago

Sorry but I would never use this format for both manual or programmatic approach.

* I've tried to read the data this format describes without reading its documentation and I just failed: the format is amazingly counter-intuitive. I never had a readability and understanding issues with XML/HTML, JSON or even YAML (that I think is overly complicated) when I saw them for the first time.

* Terse does not mean cryptic. Basic notation is just weird: why would it need unbalanced the less-than symbol to open the array? Why `<&>` for delimiting elements? Why `<<$>` but not `<$>>` at least just to be more readable by human and look balanced? The syntax goes more weird for arrays containing objects: indents (okay to some extent), `<>` and `<&>` (`{` and `}`?).

* Auto-removing whitespaces may hurt. If the format offers this, would it also offer a heredoc-style text like `cat <<EOF` in Bash so that the formatting could be preserved as is? `xml:space` and JSON string literals were designed exactly for this. (upd: I just saw new symbol: `|`... Well, okay, but another special character now.)

* Native support for arrays. I mentioned a few above. `<<Faults$$>` and `<<$$>` -- guess what these two mean if you see this first time? You would never guess. It's an empty array and an empty element, you've just failed.

* Graphs... Another weird syntax comes into the room: `#id;` but `@id` (no semicolon?). Okay, these seem to be first-class ids and refs, not necessarily designed for graphs (I'm not sure if the `#ID;` and `@` would play perfect with any non-empty names.) But what does graphs make first-class citizens here and why? Graphs can be expressed, I believe, in any data/markup format/language and then processed with a particular application if graphs are needed. By the way, arrays and objects are not necessarily trees from the semantic point of view. More graph processing issues were mentioned in other comments to this topic. What about the first-class support for sets? I'm kidding

* Comments. Another symbol here to come: `%`. To be honest, I can't recall any instance I could see the percent sign elsewhere for this purpose. What if the comments would start with a well-known `#` at least with a space right after it so that it wouldn't be considered a "graph id" (or, don't get me wrong, with another `<`/`$` sequence)

* Just got to the Escaping section and now I see how the characters are escaped. Perhaps this is okay.

* Scalars. Crazy number formatting and locale issues are waiting. The never-on-keyboard infinity symbol would be great for APL, but why not just Inf(inity)? Whatever the scalar value is, no need to cover all existing primitive scalars -- just let them be processed by an application since all scalars are text semantically. Another crazy things: what does make UUIDs that special for this format?; why does make Base64 that special so that it has native support (would it support Base16 for human-readable message digests; or Base58 to remove visually lookalike Base64 characters)?

* CR/LF? I can understand its semantic purpose, but why not LF to make it even more "blazingly" fast? Say good-bye to UNIX users.

* The cognitive load for the markup syntax absolutely does not make it efficient in typing. Believe me, it does not.

What I would do, I would probably enhance the widely used formats, say make JSON, which I find almost perfect from the syntax point of view, not require quotes for object property names if the names would not contain special characters like `:` just like it goes in JavaScript. And perhaps make XML "v2" move away from SGML hence loosening its syntax to get rid of the closing tags with shorter notation, first-class array support and fixing syntax issues especially for CDATA and comments that can't support `--`. You would blame me, but I love XML the most: it just has the richest set of standardized amazing well-designed extensions to operate XML with regardless the heavy XML syntax.

P.S. How does it look like in the document it marks up is minified (e.g., no whitespaces)?

2 comments

GeneThomas 597 days ago

> `<<$>` but not `<$>>`

Having both begin and terminate arrays start with << is more consistent.

> `<>` and `<&>` (`{` and `}`?).

Using `{` and `}` would lead to more special characters.

> Auto-removing whitespaces may hurt.

It does not.

> Graphs... But what does graphs make first-class citizens here and why?

It is simpler to support graphs in the markup. The fact is that the data being serialized may be structured in a graph.

> CR/LF

It supports LF only ᴜɴɪx line ends as well as CR/LF internet line endings.

> Comments [...] To be honest, I can't recall any instance I could see the percent sign elsewhere for this purpose

LaTex and PostScript both use % for comments. # matches the usage in ᴄꜱꜱ and ʜᴛᴍʟ, relating to an id/page location.

> What if the comments would start with a well-known `#` at least with a space right after it so that it wouldn't be considered a "graph id"

Having a space after the # differentiate between and id and comment would be a mistake.

> Scalars. Crazy [...] UUIDs

The Formats section is to facilitate interoperability between implementations, e.g. if you are encoding a ɢᴜɪᴅ [easy to say] then format it this way.

> not make it efficient in typing.

It is more terse than ᴊꜱᴏɴ.

> XML "v2" ... first-class array support

Xᴇɴᴏɴ has first class array support, the xᴍʟ like syntax leads to the <empty-arrray$$> notation.

> P.S. How does it look like in the document it marks up is minified (e.g., no whitespaces)?

Good.

link

tripple6 596 days ago

I love this.

> Having both begin and terminate arrays start with << is more consistent.

It hides context for humans. I am a human and I love to see what opens and what closes the context. Why would `<` open an array if `[` is astonishingly wide-spread practice? Why would `<<` close it just because you think it is more consistent? What if open/close balance is also consistency, especially for nested arrays?

Also just think how many key strokes you'd save if you'd use `]` instead of [Shift]+`,` [Shift]+`,` [Shift]+`4` [Shift]+`.` if you declare it as readable text.

> Using `{` and `}` would lead to more special characters.

Agree. Too many now.

> It is simpler to support graphs in the markup. The fact is that the data being serialized may be structured in a graph.

I can't understand why you call it native graph support. The only thing it does is declaring an identified element and references to the element. I can't see how different is that comparing to XML or JSON that semantically "have graph support" just because they also can declare something considered ids and references to the identified element.

> LaTex and PostScript both use % for comments.

Yes, just learnt that from your comment and https://news.ycombinator.com/item?id=42047634 by zzo38computer. Thank you.

> # matches the usage in ᴄꜱꜱ and ʜᴛᴍʟ, relating to an id/page location.

No. The # symbol is overloaded: it may be a comment start, especially for line-oriented and human-readable text formats or scripts; CSS uses it for IDs; HTML has nothing to do with it since browsers only use # as a part of a URL to reference a particular identified element for navigation purposes only (it's called anchor in URL syntax; formerly web-browsers used <a name="anchor"> to navigate to a part of the page; as of now in the HTML5 world any `id` attribute is considered an anchor which I find a design flaw since ids are something to be used to identify hence any id from the document is exposed for navigation navigation purposes, but <a name="anchor"> is semantically something for navigation).

> Having a space after the # differentiate between and id and comment would be a mistake.

Of course it would in its current perspective if the id declaration is `#`. Don't know what `#<NON_WHITESPACE_CHAR>` would do if it's legal.

> The Formats section is to facilitate interoperability between implementations, e.g. if you are encoding a ɢᴜɪᴅ [easy to say] then format it this way.

I agree that it may look better for consistency purposes, but what interoperability is all that about? Why would formatting even affect it? From the consumer application point of view, it must be handled from its context defined by its purpose and semantic type. If my element/attribute is formally declared as a GUID, then why would I care that much if it's conventionally formatted? Would it be still a GUID if I encode it using Base64? The dashes in GUIDs are for humans only and they are optional, and the application knows it's a GUID to process it even leniently if it can. The same goes for ISBN/ISSN for books and magazines, card numbers, phone numbers, etc -- none of them require dashes or spaces or parentheses to be processed.

This is why "Real numbers *should be stored* with commas for readability." is just hilarious. Why should? May I use underscores or dots or spaces to group digits (seriously, why comma)? Can I group digits after the period? If I need integers, why are they also limited to 32 bits and 64 bits? How would I present an arbitrary precision integer or non-integer number (say, I want the Pi number 197 digits after the 3)? If ∞ is allowed, but no mention on +Inf and -Inf, can be 4.2957×10^24 used instead of 4.2957e24? May I just have simple `D+(\.D+)?` for everything I need for true interoperability?

I agree consistent formatting is really beautiful, but it must never be the key to process data.

> It is more terse than ᴊꜱᴏɴ.

Sorry, it's not.

> Good

Could you please provide an example of minified (a single line, no new lines) array of timestamps from your page?

UPD: I've just seen https://news.ycombinator.com/item?id=42038508 by Oras . Well, you know.

----

In short, too many whys, weird syntax and design decisions, so I cannot see anything that makes it a "better alternative" to XML, JSON, or YAML.

link

GeneThomas 596 days ago

I don’t love this.

> I can't see how different is that comparing to XML or JSON that semantically "have graph support" just because they also can declare something considered ids and references to the identified element.

When serialising data with ᴊꜱᴏɴ one has to use special field names such as $id; hoping the programming language does not. It DOES have native graph support that xᴍʟ and ᴊꜱᴏɴ do not.

> # [..] it may be a comment start

No.

> but what interoperability is all that about?

Interoperability between implementations. If you were using Xᴇɴᴏɴ to communicate between two different languages, say the C# and a Python implementation, agreeing of what an integer IS is helpful. Both Xᴇɴᴏɴ libraries can provide support for encoding say ɢᴜɪᴅs. You have missed the point. A user is always free to encode data as arbitrary strings.

> commas [...] readability." is just hilarious. Why should?

Commas makes numbers faster to interpret. Something `ls` is missing. As I stated on another branch English is the global lingua franca so commas every three digits is the standard.

> ∞ is allowed, but no mention on +Inf and -Inf, can be 4.2957×10^24 ∞ is +Inf. 4.2957×10^24 is not the xᴇɴᴏɴ standard.

>> It is more terse than ᴊꜱᴏɴ.

>Sorry, it's not.

See https://news.ycombinator.com/item?id=42049033

<<Timestamps>2026-09-24T16\:45\:22.5383742<&>2026-10-04T18\:25\:12Z<&>2026-04-02<$>>

Better than ᴊꜱᴏɴ which does not do timestamps.

link

tripple6 595 days ago

> When serialising data with ᴊꜱᴏɴ one has to use special field names such as $id; hoping the programming language does not.

Unless a serialization/deserialization tool supports property name overriding which is trivial.

> It DOES have native graph support that xᴍʟ and ᴊꜱᴏɴ do not.

Again, how is this different from `xml:id` that is referenced from other XML document nodes and what makes it "native graph support"?

> Both Xᴇɴᴏɴ libraries can provide support for encoding say ɢᴜɪᴅs. > Better than ᴊꜱᴏɴ which does not do timestamps.

Better?

There is just no need. For what? These two can be controlled by optional schemas that may be extensible like types to validate in XML Schema or Relax NG. Schemas do not dictate format and you don't need your format to be a schema. I still can't get what makes timestamps (and GUIDs) so special so that they have special sections in your document.

I tend I think JSON also has a design flaw providing first-class support for booleans and numbers in terms of literals it took from JavaScript because the latter needs more complex syntax as a programming language. Ridiculously, XML seems to be perfect in this case unifying scalar values: whatever scalar it encodes, text representation can encode it in any efficient format regardless it is a boolean, number (integer, "real", complex, whatever special), a "human-text" string, timestamp or whatever else; HTML attribute values unlike XML don't even need to be quoted in some trivial cases and even may be omitted for boolean attributes. The application simply parses/decodes its data and manages how the data is deserialized. That's all it needs.

I would probably be happy if, say, there would be a format as simple/minimalistic as possible not even requiring delimiters like or quoted strings unless they are ambiguous. Say, `[foo 'bar baz' foo\ bar Zm9vYmFyCg== 2.415e10 ∞ +Inf -∞ -Infinity \[qux\] +1\ 123\ 456789 978012345678 {k1 v1 k2 v2} aa512e8ecf97445eac10cb5a5ea3ef63 c8a0ebbd 2026-09-24T16:45:22.5383742 P3Y6M4DT12H30M5S]` or similar, maybe with nodes metadata and comments support. The above dumb format covers arrays/lists/sets, strings `foo`, space-containing `bar baz`, `foo bar` strings in human and Base64 encoding, the `2.415e10` number from your document and both four infinity notations, a single string `[qux]` and not a nested array with a single element, a phone number (with space delimited country code, region code and local number), an ISBN, a simple map/object made of two pairs, a GUID, a CRC32 checksum, an ISO-8601 zoned date/time, and an ISO-8601 duration. What more scalar types it can be extended with? Since there is no type for scalars in this "format" does not dictate types or preferred scalar formats letting the application make decision how to interpret these on its own.

> Commas makes numbers faster to interpret. Something `ls` is missing. As I stated on another branch English is the global lingua franca so commas every three digits is the standard.

For whom? Humans? Why would data encoding obey region number|date/time notation standards at all? English, but US, UK, Canada, or any other English-speaking country? You've been told that in that thread too, especially if spaces or underscores are even more readable for monospace fonts. You don't need it.

> See https://news.ycombinator.com/item?id=42049033

Funny enough -- your format saves on key/value pairs syntax appealing to 4 vs 6 overhead (okay, cool), but your array elements delimited with `<&>`, and amazingly bad at keyboard typing ergonomics, loses to simple and regular JSON `,` syntax (3 vs 1 overhead). Isn't it blind or crazy?

link

GeneThomas 595 days ago

> Again, how is this different from `xml:id`

It is a tidier solution.

> I still can't get what makes timestamps (and GUIDs) so special so that they have special sections in your document.

They are common in data.

> [...] boolean attributes

Separate attributes and sub elements is a mistake. One should be able to guess an ᴀᴘɪ.

> What more scalar types it can be extended with?

> letting the application make decision how to interpret these on its own.

That is laborours! A Xᴇɴᴏɴ library provides AsGuid, AsDateTime etc.. and serialization directly to/from those types.

>For whom? Humans?

Yes. Human have to read markup.

> Why would data encoding obey region number|date/time notation standards at all? English, but US, UK, Canada, or any other English-speaking country?

I repeat! READABILITY.

> Isn't it blind or crazy?

No, quite the opposite.

link

tripple6 594 days ago

> It is a tidier solution.

Based on special syntax. You're about to introduce node attributes.

> They are common in data.

I use tables everyday. May I have "first-class graph support" but for tabular data that is very common as well? I expected three or four times you eventually explain what makes the graph support and how it differs from declaring ids and refs in other formats you think are worse than yours. No answer.

> Separate attributes and sub elements is a mistake. One should be able to guess an ᴀᴘɪ.

For the first, I kind of agree that attributes and subnodes should be unified in favor of subnodes (which was sacrificed for markups like HTML for sane brevity sake). However attributes, your ids are, may be metadata for nodes of any kind. For the second, API for what? Document generating/parsing API? Validation API? Serialization/deserialization API? Enveloped application API? I guess, the latter for whatever reason dictated in your "standard" . In any case documentation, schemas, data validators and autocompletes are my best friends, no need to "guess".

> That is laborours! A Xᴇɴᴏɴ library provides AsGuid, AsDateTime etc.. and serialization directly to/from those types.

What you're mentioning is called serialization and deserialization, and these two be easily implemented once for "basic" types and extended at the application level for any kind of data, because an application decides what to do with data on its own, not the format the data is enveloped in. Serialization and deserialization don't exist from the format perspective which only defines the syntax way data is marked up in a document. So why would it care the formatting at all?

> Yes. Human have to read markup.

Format should not care too much.

> I repeat! READABILITY.

No yelling please. Regional formats are defined by countries, not languages you said elsewhere, just by definition, even if English is the lingua franca. Separate digits with underscores or spaces.

I'm very happy your "standard" neither recommend color highlighting for, say, numbers, nor even worse has special syntax for readability highlighting. Highlighting increases readability greatly as well, you know.

> No, quite the opposite.

6:4 but 1:3 is a great syntax win. Okay.

No any solid counter arguments from your side being blind for obvious design flaws of your so-called format "standard" only tells how you mixed up all concepts in a mess of crazy syntax markup, and scalar object formatting for scalars that only must be handled by applications while serialization and deserialization regardless the markup format "standard" recommends.

Good luck with your "standard" rightly criticized and rejected by others, but better just bury it not spending your life for nothing. Sincerely.

link

zzo38computer 597 days ago

> Comments. Another symbol here to come: `%`. To be honest, I can't recall any instance I could see the percent sign elsewhere for this purpose.

PostScript is one programming language that uses a percentage sign for comments. TeX and METAFONT also use a percentage sign for comments. There are others, too.

link