| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by optimalsolver 673 days ago
	Rather than doing self-supervised learning on the actual video frames, why not do it on the byte sequence that represents the video file?

1 comments

You might find this paper interesting: [JPEG-LM: LLMs as Image Generators with Canonical Codec Representations](https://arxiv.org/abs/2408.08459)

Thanks. This is exactly the kind of thing I was looking for.