Hacker News new | ask | show | jobs
by paraschopra 108 days ago
Do you have more info on video encoding process?

You write:

>We created a model without this tradeoff by training our video encoder on a masked compression objective

And I understand why this would give you more detail per token, but how are you reducing total number of tokens?