Hacker News new | ask | show | jobs
by Finnucane 1168 days ago
What a pointless debate. I've worked with XML manuscript archives and I can be certain that if I'd had to do it in JSON I'd have killed myself.
2 comments

This is about JSON being created or discovered and Doug struggling to convince people it was relevant when everyone was so bought in on XML.

Are you saying you think JSON shouldn't exist and everyone should use XML for everything?

Tooling around XML was certainly more established, but man there was a lot of complexity built up around it.

No. JSON is great as Javascript's serialization format, but it's not as readable and robust as XML, period.

I use both extensively, and for bigger objects and definitions, XML is a very clear winner.

I'm a big believer in horses for courses type of approach, and my personal gripe is the push to replace one thing with another. These data types can coexist, and can be used where they shine. XML can be read and written stupidly fast, so it's way better as a on disk file format if people gonna touch that file.

YAML and JSON are not the best fit for configuration files. JSON is good as an on-disk serialization format if humans not gonna touch that. XML is the best format for carrying complex and big data around. TOML is the best format for human readable, human editable config files.

My only quip is both are basically unreadable in most use cases. Most programs worth anything that use these formats usually strip out all the extra spaces and formatting. You usually have to take an extra step to 'reformat' just so you can read it. And anyone who has had an open paren or carrot or missing could show how painful manually parsing a 400+ field one of these is. Trying to say one is better than the other ignores the use cases for both. One being good at slugging data into javascript/python. The other being good at light typing, annotation and transform.
I never seen a tool which stores its XML config in a minified/uglified form by removing whitespace. The biggest two tools I play and which use XML are Keycloak and Eclipse, and none of them do this.

All of the parsers I used, and editors I have edited XML always shown the correct place where a caret is missing or XML is broken in anyway, so I have never hunted anything down inside a big XML file.

However, this doesn't invalidate your experience about unreadable XML files, which are most definitely present in the wild.

However, I agree that none of them are good config file formats, but storing data, I'll take XML all day, every day (except when I really need a binary file format, e.g.: for compressing data).

What specifically about XML means it can be read/written "stupidly fast"?

It's still a text bound serialization format, you still have to parse a tree for it.

Is it just particularly mature libraries?

It is primarily mature libraries, but also XML is more straightforward to parse, because there are not many data types and tags makes it very deterministic.

By "stupidly fast", I mean I can read a 120K XML file, parse it, create the objects which generated from that file definition under 2ms. The library I use (RapidXML [0]) can parse the file almost with the same time cost of running strlen() on the same file. That's insane.

[0]: https://rapidxml.sourceforge.net/

Being a maintainer of the fastest XML library for Rust, I strongly disagree that XML is inherently fast to parse, and I question any such claim which comes with no evidence. Especially when it has remained unchanged on their page since (at least) 2008 [0]. Have you actually tested that claim or are you taking it at face value?

IME the XML spec is so complex that you either end up with a slow but compliant parser or a fast one that doesn't implement the spec completely.

JSON, unlike XML, is minimal enough that writing an entire compliant parser with SIMD intrinsics [1] is actually practically feasible. That library claims 3 GBps parsing speed, which could theoretically process your 120kb of data in 1/25000th of a second instead of 2/1000ths of a second.

I would wager that JSON is faster to parse, on balance.

[0] https://web.archive.org/web/20080209172554/https://rapidxml....

[1] https://github.com/simdjson/simdjson

YAML is excellent as a post natal abortion mechanism. Anyone working on its parser will question why live when YAML exists. Source: I'm developing a YAML parser.

What broke me were: plain string and empty node handling.

Here is a fun quiz. Which of these two documents or both or neither are valid. With explanation ofc.

Yaml#1

     :
Yaml#2

    :
XML is great except at being a configuration format, a messaging format, a serialization format, or any other purpose really. It's not insane like YAML I'll give it that. I'll take XML over that garbage any day.
The Complexity of XML reminds me of something from Adam Bosworth's ISCOC04 Talk [0]. To me, the big takeaway is that HTML succeeded because of it's limitations, not despite of them. JSON seems very simple compared to XML. XML seems to be very powerful, but also very complex - it's like, if all you need to do is pick your kids up from Soccer Practice, you don't need the powerfullness (complexity) of the Space Shuttle in your vehicle.

  In 1996 I was at some of the initial XML meetings. 
  The participants� anger at HTML for �corrupting� 
  content with layout was intense. Some of the initial 
  backers of XML were frustrated SGML folks who wanted 
  a better cleaner world in which data was pristinely 
  separated from presentation. In short, they disliked 
  one of the great success stories of software history, 
  one that succeeded because of its limitations, not 
  despite them. I very much doubt that an HTML that had 
  initially shipped as a clean layered set of content 
  XML, Layout rules – XSLT, and Formatting- CSS) would 
  have had anything like the explosive uptake.

https://adambosworth.net/2004/11/18/iscoc04-talk/
But you don't have to use any more of the XML-related standards than you want to. You can ignore schemas, and add-on technologies like XPATH and XSLT and just use XML as a hierarchical tag-value format, just like JSON.

At this level they are both about equal in complexity: JSON has data types that XML doesn't, and XML has attributes and CDATA that JSON doesn't. JSON syntax is more succinct, but XML syntax is more regular.

XML is good for documents that don't have a regular markup (XHTML, DocBook, JATS, MathML, etc.) where you can mix content elements -- e.g. italic annotations.

JSON is good for structured data/records such as serialized data structures found in RPC protocols.

They both have their own pros and cons that make them suited to different use cases. Choose the one that best suites your data model and use cases.

Debate? Did you even read the article? It was about the history of how JSON came around. I didn't read a debate (despite what the title implies).