Yes, something prevents llms from being RLed to do this: You can't see through something opaque to determine whether there's something high calorie or low calorie out of sight.
The problem itself is unsolvable given the data provided.
You could conceivable make it better at making guesses, but they will inherently always be guesses that will sometimes be wildly off.
Extreme example perhaps, but no, you can't just turn pixels into calories. Right now I'd be impressed if we could reliably estimate volume to within 30% from a photo, but even with that correct the contents of the food can easily be way off without visible sign.
The problem itself is unsolvable given the data provided.
You could conceivable make it better at making guesses, but they will inherently always be guesses that will sometimes be wildly off.