Hacker News new | ask | show | jobs
by marmadukester39 1254 days ago
Is it? Videos are just sequences of frames
1 comments

Each frame of the image would have to be divided into many sequences. Atleast that's how transformer based image models work. Then you have to account for audio data too in the same way. It just blows up the compute required