| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tripple6 597 days ago

> When serialising data with ᴊꜱᴏɴ one has to use special field names such as $id; hoping the programming language does not.

Unless a serialization/deserialization tool supports property name overriding which is trivial.

> It DOES have native graph support that xᴍʟ and ᴊꜱᴏɴ do not.

Again, how is this different from `xml:id` that is referenced from other XML document nodes and what makes it "native graph support"?

> Both Xᴇɴᴏɴ libraries can provide support for encoding say ɢᴜɪᴅs. > Better than ᴊꜱᴏɴ which does not do timestamps.

Better?

There is just no need. For what? These two can be controlled by optional schemas that may be extensible like types to validate in XML Schema or Relax NG. Schemas do not dictate format and you don't need your format to be a schema. I still can't get what makes timestamps (and GUIDs) so special so that they have special sections in your document.

I tend I think JSON also has a design flaw providing first-class support for booleans and numbers in terms of literals it took from JavaScript because the latter needs more complex syntax as a programming language. Ridiculously, XML seems to be perfect in this case unifying scalar values: whatever scalar it encodes, text representation can encode it in any efficient format regardless it is a boolean, number (integer, "real", complex, whatever special), a "human-text" string, timestamp or whatever else; HTML attribute values unlike XML don't even need to be quoted in some trivial cases and even may be omitted for boolean attributes. The application simply parses/decodes its data and manages how the data is deserialized. That's all it needs.

I would probably be happy if, say, there would be a format as simple/minimalistic as possible not even requiring delimiters like or quoted strings unless they are ambiguous. Say, `[foo 'bar baz' foo\ bar Zm9vYmFyCg== 2.415e10 ∞ +Inf -∞ -Infinity \[qux\] +1\ 123\ 456789 978012345678 {k1 v1 k2 v2} aa512e8ecf97445eac10cb5a5ea3ef63 c8a0ebbd 2026-09-24T16:45:22.5383742 P3Y6M4DT12H30M5S]` or similar, maybe with nodes metadata and comments support. The above dumb format covers arrays/lists/sets, strings `foo`, space-containing `bar baz`, `foo bar` strings in human and Base64 encoding, the `2.415e10` number from your document and both four infinity notations, a single string `[qux]` and not a nested array with a single element, a phone number (with space delimited country code, region code and local number), an ISBN, a simple map/object made of two pairs, a GUID, a CRC32 checksum, an ISO-8601 zoned date/time, and an ISO-8601 duration. What more scalar types it can be extended with? Since there is no type for scalars in this "format" does not dictate types or preferred scalar formats letting the application make decision how to interpret these on its own.

> Commas makes numbers faster to interpret. Something `ls` is missing. As I stated on another branch English is the global lingua franca so commas every three digits is the standard.

For whom? Humans? Why would data encoding obey region number|date/time notation standards at all? English, but US, UK, Canada, or any other English-speaking country? You've been told that in that thread too, especially if spaces or underscores are even more readable for monospace fonts. You don't need it.

> See https://news.ycombinator.com/item?id=42049033

Funny enough -- your format saves on key/value pairs syntax appealing to 4 vs 6 overhead (okay, cool), but your array elements delimited with `<&>`, and amazingly bad at keyboard typing ergonomics, loses to simple and regular JSON `,` syntax (3 vs 1 overhead). Isn't it blind or crazy?

1 comments

GeneThomas 596 days ago

> Again, how is this different from `xml:id`

It is a tidier solution.

> I still can't get what makes timestamps (and GUIDs) so special so that they have special sections in your document.

They are common in data.

> [...] boolean attributes

Separate attributes and sub elements is a mistake. One should be able to guess an ᴀᴘɪ.

> What more scalar types it can be extended with?

> letting the application make decision how to interpret these on its own.

That is laborours! A Xᴇɴᴏɴ library provides AsGuid, AsDateTime etc.. and serialization directly to/from those types.

>For whom? Humans?

Yes. Human have to read markup.

> Why would data encoding obey region number|date/time notation standards at all? English, but US, UK, Canada, or any other English-speaking country?

I repeat! READABILITY.

> Isn't it blind or crazy?

No, quite the opposite.

link

tripple6 596 days ago

> It is a tidier solution.

Based on special syntax. You're about to introduce node attributes.

> They are common in data.

I use tables everyday. May I have "first-class graph support" but for tabular data that is very common as well? I expected three or four times you eventually explain what makes the graph support and how it differs from declaring ids and refs in other formats you think are worse than yours. No answer.

> Separate attributes and sub elements is a mistake. One should be able to guess an ᴀᴘɪ.

For the first, I kind of agree that attributes and subnodes should be unified in favor of subnodes (which was sacrificed for markups like HTML for sane brevity sake). However attributes, your ids are, may be metadata for nodes of any kind. For the second, API for what? Document generating/parsing API? Validation API? Serialization/deserialization API? Enveloped application API? I guess, the latter for whatever reason dictated in your "standard" . In any case documentation, schemas, data validators and autocompletes are my best friends, no need to "guess".

> That is laborours! A Xᴇɴᴏɴ library provides AsGuid, AsDateTime etc.. and serialization directly to/from those types.

What you're mentioning is called serialization and deserialization, and these two be easily implemented once for "basic" types and extended at the application level for any kind of data, because an application decides what to do with data on its own, not the format the data is enveloped in. Serialization and deserialization don't exist from the format perspective which only defines the syntax way data is marked up in a document. So why would it care the formatting at all?

> Yes. Human have to read markup.

Format should not care too much.

> I repeat! READABILITY.

No yelling please. Regional formats are defined by countries, not languages you said elsewhere, just by definition, even if English is the lingua franca. Separate digits with underscores or spaces.

I'm very happy your "standard" neither recommend color highlighting for, say, numbers, nor even worse has special syntax for readability highlighting. Highlighting increases readability greatly as well, you know.

> No, quite the opposite.

6:4 but 1:3 is a great syntax win. Okay.

No any solid counter arguments from your side being blind for obvious design flaws of your so-called format "standard" only tells how you mixed up all concepts in a mess of crazy syntax markup, and scalar object formatting for scalars that only must be handled by applications while serialization and deserialization regardless the markup format "standard" recommends.

Good luck with your "standard" rightly criticized and rejected by others, but better just bury it not spending your life for nothing. Sincerely.

link

GeneThomas 595 days ago

> You're about to introduce node attributes.

Yes, but limited to #id and :type.

> tabular data that is very common as well?

Xᴇɴᴏɴ has first class arrays also so tabular data could be stored as such.

> explain what makes the graph support and how it differs from declaring ids and refs in other formats you think are worse than yours. No answer.

It is built in!

> So why would it care the formatting at all?

FOR INTEROPERABILITY! That is different implementations of xᴇɴᴏɴ agree on what a ɢᴜɪᴅ or date looks like! Fʏɪ, with a good implementation of xᴇɴᴏɴ you just point the library at your data, sometimes augmented with some attributes, and you get cleanly formatted markup.

>>One should be able to guess an ᴀᴘɪ.

> For the second, API for what?

Say you are using an ᴀᴘɪ for information about a person and their is information about their height, in xᴇɴᴏɴ one knows there shall be a scalar called “Height”, in xᴍʟ it may be an attribute or a sub element.

>> Yes. Human have to read markup.

> Format should not care too much.

We are using text formats because they are READable to humans.

> Separate digits with underscores or spaces.

That is not standard anywhere.

> [...] color highlighting

Only the application knows if a scalar is a number or a string.

There are no obvious design flaws. Take xᴍʟ, add an array type and xᴇɴᴏɴ results.

We must be talking a cross purposes re formatting. [phew...] An application has an object called Person, and a field called Height with a type of double. C♯: Person fred = new Person { Height = 1.67 }; string xenon = XenonStart.Serialize("person", fred), results in the string "<person><Height=1.67><$>". A xᴇɴᴏɴ implementation in another language, say JavaScript can take that xᴇɴᴏɴ string and decode it into an object with a field called Height with a value that can be decoded .AsNumber into 1.67; because there is a standard for encoding a ɪᴇᴇᴇ 64 bit number/.net double/JavaScript number.

Xᴇɴᴏɴ has more benefits.

link

GeneThomas 592 days ago

> * Native support for arrays. I mentioned a few above. `<<Faults$$>>` and `<<$$>>` -- guess what these two mean if you see this first time? You would never guess. It's an empty array and an empty element, you've just failed.

<< means it relates to starting an array, $>> means it is the end, $$ meaning something else — an empty array!

The xᴍʟ alternative is a bodge:

    public class PurchaseOrder
    {
        public Item[] ItemsOrders;
    }

    public class Item
    {
        public string ItemID;
        public decimal ItemPrice;
    }

serializes to:

    <PurchaseOrder>
        <ItemsOrders>
            <Item>
                <ItemID>aaa111</ItemID>
                <ItemPrice>34.22</ItemPrice>
            </Item>
            <Item>
                <ItemID>bbb222</ItemID>
                <ItemPrice>2.89</ItemPrice>
            </Item> 
        </ItemsOrders>
    </PurchaseOrder>

Where the array is marked up as two sub elements both called <Item>:

Xᴇɴᴏɴ has first class support for arrays:

  <PurchaseOrder>
      <<ItemsOrders>
          <ItemID=aaa111>
          <ItemPrice=34.22>
      <&>
          <ItemID=bbb222>
          <ItemPrice=2.89>
      <$>>
  <$>

The elements may be scalars so

  <PurchaseOrder>
      <<ItemsOrders>
      <$>>
  <$>

has an array with one item of the empty string. So a separate syntax for empty arrays is required!

  <PurchaseOrder>
      <<ItemsOrders$$>>
  <$>

link