The configuration of the session accepts a parameter (modalities) that could restrict the response only to text. See it in https://platform.openai.com/docs/api-reference/realtime-clie....