There's nothing wrong with being resource efficient, but relatively SQL queries used to take two minutes, and the result printed on paper, and the whole machine cost two million.
Yeah and sending a message to someone across the sea used to take months and probably wouldn’t even get there half the time, that doesn’t meant it’s not ridiculous that I can’t run two instances of slack without my computer melting.
Without really digging into this, if it's a neural model like a sibling comment mentions perhaps you can do model compression that shrinks many multiples, if that is the majority of the image size.