multimodal models with environmental grounding may eventually have an analog [of affect]. text-only agents can’t.