|
|
|
|
|
by paraschopra
108 days ago
|
|
Do you have more info on video encoding process? You write: >We created a model without this tradeoff by training our video encoder on a masked compression objective And I understand why this would give you more detail per token, but how are you reducing total number of tokens? |
|