Y
Hacker News
new
|
ask
|
show
|
jobs
by
optimalsolver
673 days ago
Rather than doing self-supervised learning on the actual video frames, why not do it on the byte sequence that represents the video file?
1 comments
mkaic
673 days ago
You might find this paper interesting: [JPEG-LM: LLMs as Image Generators with Canonical Codec Representations](
https://arxiv.org/abs/2408.08459
)
link
optimalsolver
672 days ago
Thanks. This is exactly the kind of thing I was looking for.
link