Hacker News new | ask | show | jobs
by twic 4446 days ago
I would say the flaw is that XML parsers will try to resolve external entities on their own, by resolving file paths or whatever. They shouldn't do this by default: they should instead take a programmer-supplied entity resolver and call into that.

They could also provide a canned resolver which hits the local filesystem and/or the web, which programmers could supply if they wanted, but this should not be a default. The programmer should have to explicitly specify that access.

I've had related problems where XML parsers would try to go off and fetch DTDs from the web, then fail, because they were running on firewalled machines that couldn't see the servers hosting the DTDs. That took us by surprise. We installed an entity resolver that looked in a local cache of DTDs instead, which was fairly easy. But i would prefer not to have been surprised.

Also, all this stuff should be running in a jail where it can't even see any interesting files, of course.

1 comments

> They shouldn't do this by default: they should instead take a programmer-supplied entity resolver and call into that.

Then the programmers would write their own resolvers with even more bugs most probably. You would have 10 000 broken implementations of that code, half of them copied from stackoverflow example with security left as exercise for reader.

You could have a default implementation that callers have to set, eg:

    xmlSetFileResolver (xml, xmlDefaultFileResolver);
Callers could provide their own, but most will use none or use the supplied default.

Of course nothing helps for people who code by copying and pasting, rather than understanding what the API or library does.