Most other word embeddings have hundreds of dimensions, not thousands. Are you able to hint at what causes this difference? Do you see better downstream task performance?
If you're training a big regression, you'll probably get better results with the larger embedding.
We decided to err on the side of making the embeddings too big, because it's very easy to reduce the number of dimensions on the user's end, and impossible to increase it.
If you're doing clustering or instance retrieval, you probably want to PCA the number of dimensions down to 200 or so. (In fact, we do this in the tutorial at https://www.basilica.ai/tutorials/how-to-train-an-image-mode... .)
If you're training a big regression, you'll probably get better results with the larger embedding.
We decided to err on the side of making the embeddings too big, because it's very easy to reduce the number of dimensions on the user's end, and impossible to increase it.