Hacker News new | ask | show | jobs
by Havoc 1023 days ago
>The model has 70k unused embeddings for multimodal extensions,

Could someone briefly explain what this means? multimodal as in picture, but if unused then presumably that part is somehow untrained...so it wouldn't know what to do with the picture?

1 comments

Yes, it wouldn't know what to do with the picture unless you fine-tune the model (which is why they are permissively releasing it).

The embeddings form the vocabulary of the model. The vocabulary "namespace" has 70k empty slots so you could introduce your own tokens and train on top of that, where token = some patch of multimodal data.

Gotcha. Thanks for explaining