|
|
|
|
|
by ilaksh
174 days ago
|
|
I wonder if some day there will be a video codec that is essentially a standard distribution of a very precise and extremely fast text-to-video model (like SmartTurboDiffusion-2027 or something). Because surely there are limits to text, but even the example you gave does not seem to me to be beyond the reach of a text description, given a certain level of precision and capability in the model. And we now have faster than realtime text to video. |
|
To the extent that that could work, I would imagine that I, personally, would be happy reading the textual description instead of watching the video, and for me, we'd now be even closer to text wins 100% of the time.
In other words, it's not that you _can't_ give excellent descriptions that would obviate the need for video, it's just that people _don't_, even, or perhaps even especially, when they think they do.
If someone writes text that creates a video that shows exactly how to get something apart, then _presumably_ they also watch the video to make sure it works.
So the video becomes a debugging tool for their instructions. Perhaps not as good as watching 100 people do it, but maybe even better in some ways.
So the video codec you describe could be a useful tool to help create more programmers.
https://www.commitstrip.com/en/2016/08/25/a-very-comprehensi...