|
|
|
|
|
by lukebuehler
432 days ago
|
|
I just started to look into multi-modal embedding models recently, and I was surprised how few options there are. For example, Google's model only supports 30 text tokens [1]!! This is definitely a welcome addition. Any pointers to similarly powerful embedding models? I'm looking specifically for text and images? I wish there'd be also one that could do audio and video, but I don't think that exists. [1] https://cloud.google.com/vertex-ai/generative-ai/docs/embedd... |
|