sure, it would be simple enough to copy what was done with the server side image maps thing, and define some query parameters that get requested along with the image. So each different size gets its own url and proxies and caches are happy. servers that don't support it just serve a normal image. a tiny bit of javascript can polyfill it in now. today.
That is a very appealing design, but sadly it doesn't address all the use cases. In particular one thing that people want to be able to do is to display a different image depending on the viewport dimensions, for example a closer crop on a small screen compared to that shown on a larger screen.
We switched slightly midstream and we are now talking about an "entirely server side solution", if you scan back a few posts up. It's fairly trivial to do image processing on a server and cache the results based on the URL
i wonder if you could do it with separate frames of a GIF? there's no reason they have to be played as an animation. (they don't even have to be the same size as each other or the logical screen, afaict.)