Or, the designers/developers can make sensible decisions about what to include. The existing tools are fine, and provide plenty of techniques for optimizing to different viewing platforms and environments. The problem is people getting carried away with the whiz-bang stuff that's possible, while ignoring degradation. I can't imagine how XML/XSLT would be any better.
The issue isn't what format the content is in, but that it's retrieved through client-side logic rather than as part of the query response.
Also, what's wrong with HTML/CSS that is in any way resolved by XML/XSLT? I'm genuinely curious why somebody would put their content into a format with no default, well-understood display semantics.
Maybe I misunderstood how the xml/xslt-paradigm works, but my idea was that you have a serverside api serving up raw data, and the display is controlled by some code on the clientside that can be easily customised. So the server could serve up something like
<shoe>
<images>
<image>foo.com/x.jpg</image>
<image>foo.com/y.jpg</image>
</images>
<description>This shoe is very advanced bla bla bla</description>
<fancyintroanimation>foo.com/intro1.swf</fancyintroanimation>
</shoe>
<shoe>
....
and the client could use the default xslt if they were happy with that, and a custom otherwise.