Hacker News new | ask | show | jobs
by vidarh 4446 days ago
Also horrible defaults in XML parsers. That any XML parsers allow retrieval of DTD's without explicit options specifying allowed sources etc. is beyond me. It's not just local file access, which becomes a security hole when you let users pass you XML files, though that is one of the worst ones.

But the number of times I've seen production apps that turn out to behind the scenes request DTD's or schemas from remote servers regularly have made that one of the first thing I check if I am tasked to maintain or look into anything that parses XML. Often these apps stop working or slow down for seemingly no reason because the DTD or schema becomes unavailable, and nobody understands why.

2 comments

The crazy part about this is that I remember having these conversations over a decade ago and it was very clearly recognized as a major security, reliability and performance problem but the greater XML community basically just shrugged it off.

One really interesting aspect of this is that many applications suddenly broke when the Republicans shut down the government last year because a number of XML schemas are managed by government agencies who were suddenly legally unable to provide their normal web services:

http://gis.stackexchange.com/a/73777 http://forums.arcgis.com/threads/94294-Expected-DTD-markup-w... http://www.catalogingrules.com/?p=77

Makes me wonder whether it's time to start contributing patches to disable bad ideas like this by default — some places are clearly paying a significant amount to serve content nobody should need: http://www.w3.org/blog/systeam/2008/02/08/w3c_s_excessive_dt...

It's bad practice to fetch an external DTD on a server you don't control, first for security reasons, second because your application then depends on something that can go away anytime, third because it's rude to the third party.

twic is right that one should always use entity resolvers that point to local ressources and that parsers should run in a sandbox without external access.

He's also right to say that by default parsers shouldn't go fetch external resources; I think the reason is historical; entity resolvers appeared later than the parsers themselves.

It is bad practise but you know that it is uncannily common?

Just remember that the W3C had to impose download restrictions on the (X)HTML DTDs (http://www.w3.org/Help/Webmaster#block)