The model is literally "inferring" something about its inputs: e.g., these pixels denote a hot dog, those don't.