|
|
|
|
|
by antx
306 days ago
|
|
Also with the rapid advances of vision language models, I would be surprised if we don't see image-to-text-to-voice system that works with real-time video in a not-so-far future! Like a reverse "Genie" where instead of providing a prompt and it generates a world, you provide a streaming video and it spouts relevant information when changes happen, or on demand, for instance... |
|