Hacker News new | ask | show | jobs
by optimalsolver 673 days ago
Rather than doing self-supervised learning on the actual video frames, why not do it on the byte sequence that represents the video file?
1 comments

You might find this paper interesting: [JPEG-LM: LLMs as Image Generators with Canonical Codec Representations](https://arxiv.org/abs/2408.08459)
Thanks. This is exactly the kind of thing I was looking for.