Hacker News new | ask | show | jobs
by throwaway48540 669 days ago
It did the same thing ChatGPT does when it picks up your writing style and exact words/sentences after a few messages. Literally - the audio is encoded as tokens and fed to the LLM, there is no distinction between text and audio from the model's point of view.