Hacker News new | ask | show | jobs
by jarym 2017 days ago
Glad this got found. I remember when XML was being widely adopted that there'd be frequent vulnerabilities found in Java-based parsers.

A large part of this stems from how complicated XML can get - if it were only elements and attributes it might have been fine. Namespaces made it a bit more complicated. Processing Instructions made it hideous.

2 comments

XML and many other things related to it (including Java, SOAP, CORBA, etc.) are an example of what could be called "Enterprise mindset" taken to an extreme. Insanely high levels of abstraction and indirection, absurd amounts of needless flexibility and generality, and essentially zero thought given to efficiency or simplicity. It's as if the people responsible for these spent all their time thinking "what's the most complicated way to do something?"
It's really nuts trying to implement something like SAML in XML. The standard is a security minefield.
What makes it worse is that XMLDSIG is exponentially more complicated. Most of the ecosystem literally shells out to libxmlsec1 and assumes it does the right thing. DSIG is a batshit standard that attempts to support arbitrary combinations of signed and unsigned parts in a single document, tied together with a DOM-like scheme, passed through a canonicalizing transformation that has itself broken SAML before. It's a fractal of bad security design.
Not to mention that libxmlsec1 has some insane insecure defaults that are effectively undocumented.

(I'd go into more details, but i literally just sent a security report yesterday to a saml library for using it wrong, so i guess i shouldn't post publicly about it until they fix)

You probably shouldn't have posted this either.
Also it's my experience that nobody follows the standard - producing valid SAML isn't enough, you need to produce the exact SAML your consumer expects or receivers will reject it. (The context here was passing users off from healthcare.gov to issuers)
What makes it worse is that there are practical reasons to implement that way; I've done so for clients, because of bugs found in other SAML parsers that we couldn't leave people susceptible to. One of the material things you can do to lock down a SAML implementation is to accept only the pattern of XML tokens you expect from mainstream IdPs, and then wait for people to complain.
I'm so happy I no longer need to work with it. I wrote a manifesto on how it (doesn't) work for the person that replaced me on that project, and it was long, detailed, and angry
I've been cavorting around that minefield recently. I still have some of my legs and a tiny bit of my sanity.

The most recent "fun" I had was that on a Citrix NetScaler, if you enable a certain n-Factor workflow, it sends a SAML request to the IdP that Microsoft products only reject as "invalid XML".

From what I can gather the XML being sent is perfectly valid. The issue must be something hideously subtle, like the white space or UTF-8 encoding being subtly different that is upsetting the Microsoft SAML implementations, but not any others.

Have a look at some SAML XML examples online: https://www.samltool.com/generic_sso_res.php

They're hideous not because they're XML, but because they're bad XML! The SAML standard defines its own "namespace attributes" separately but on top of the XML namespaces!

Similarly, instead of the straightforward way to encode the data:

    <tag prop="attr">value</tag>
They abstract one level up unnecessarily:

    <element name="tag">
        <attribute name="prop">attr</attribute>
        <content>value</content>
    </element>
This is the same mistake people make in database schema design, where they'll have a table with columns called "Key", "ColumnName", and "ColumnValue".
The issue must be something hideously subtle, like the white space or UTF-8 encoding being subtly different that is upsetting the Microsoft SAML implementations, but not any others.

That is almost certainly the case, as another comment here indirectly references: https://news.ycombinator.com/item?id=25422734

oh wow that's disgusting, why would someone design something like this.
It's the most egregious example of design-by-committee that I have ever seen.

Everything about SAML is about 10x more complex than it technically needs to be.

On top of that, it has so many optional features that interoperability problems are likely even between 100% standards compliant implementations.