Hacker News new | ask | show | jobs
by raverbashing 4446 days ago
This is another reason not to use XML, plain and simple

It's too much hidden power in the hands of those who don't know what they're doing (loading external entities pointed in an XML automatically? what kind of joke is that?)

5 comments

Sure, but didn't YAML in Rails do mostly the same type of thing? It's not just XML that is dumb like this.
YAML and XML seem too powerful and too complex for their own common use cases (data storage). Markdown too - how many Markdown parsers allow for strict parsing against an HTML whitelist, and don't allow native HTML at all by default?
I've never even thought of that. Wow. Obvious now of course.
> loading external entities pointed in an XML automatically? what kind of joke is that?

Your browser does much the same when parsing (X)HTML. LaTeX naturally includes ‘external’ resources when building an output file. There are tons of examples like that, loading external entities per se is not wrong, it’s mostly just wrong under these specific circumstances.

I think the important difference here is that with browsers, the behavior is well-known and well-understood, there are a very small number of them, and you're unlikely to run one in a production environment -- barring, say, something like PhantomJS, which still has all the foregoing in its favor.

This compared to XML parsers, for which there are often multiple per language, each of which may be implemented to wildly different levels of sophistication re: security.

My point was that it is not an unreasonable thing to have some sort of #include directive in a data format, and certainly not in a markup language.

The problem here was the same as in the rest of the software industry: programmers are far from ‘engineers’ in their desire to understand their tools, use the right tools and build bug-free code. Instead, most people hack for fun with tools they hardly understand and then somehow manage to complain if they shoot off their feet while doing so.

Hacking for fun and shooting off extremities is of course perfectly fine, but the blame for the latter lies in the programmer (and possibly their education), not the tools.

XML made it for more manageable to create machine to machine API's. I can say we surely would not want go back to the 80's and 90's when dong that stuff was a nightmare.
Yes, it was a drunken, stumbling step forward. Let's take another one, and move to something simpler, which solves the problem better.

To quote Phil Wadler's paper about XML, where he established some of the principles that influenced Xquery: "So the essence of XML is this: the problem it solves is not hard, and it does not solve the problem well."[1]

I suggest reading the entire paper; It shows a number of shortcomings, but it's also rather enlightening about how XML actually is structured, and how its semantics are defined. (ie, in spite of that quote, it's not just XML bashing)

[1]http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.109...

Hm. In his introduction, he says, "XML is touted as an external format for representing data." To me that mostly misses the value of XML. I think of it as an interchange format, not a closely-mirror-my-datastructures format. I've used it before when I want a long-lived data format that is mostly annotated text, and I'd happily do it again.

That said, I'm very skeptical of the XML-for-everything school, and nearly murdered a group of engineers who were using XML to transfer data from one spot in an app to another, even though it all ran in the same JVM. So maybe I'm more defending a small subset of XML rather than the XML-industrial complex.

How about protocol buffers?
I give to you that it's better than CORBA

What I don't agree is that it allows a "load this" where this can be a local file, an url in some cases, anything basically

That's an overly narrow view. We shouldn't avoid powerful features merely because power can cause problems.

Where would we be if web browsers couldn't use external resources?

General-purpose parsers/renderers need have tightly locked down, sensible defaults, or even security-oriented feature subsets, but that doesn't mean we should remove one of their most useful features altogether, or avoid them because they're powerful and dangerous.

There is a big difference between a web browser in your local machine and a server processing all untrusted data that is thrown at him
scnr

    XML - It seemed like a good idea at the time
Not to everyone. Some of us greybeards tried to warn against it :

"XML is simply lisp done wrong." — Alan Cox

but the gee-whizzery won.

"XML combines the efficiency of text files with the readability of binary files" — unknown

"XML is a classic political compromise: it balances the needs of man and machine by being equally unreadable to both." — Matthew Might

Anyone remember XHTML ?