| HN Mirror

It depends on the task.

If you're doing clustering or instance retrieval, you probably want to PCA the number of dimensions down to 200 or so. (In fact, we do this in the tutorial at https://www.basilica.ai/tutorials/how-to-train-an-image-mode... .)

If you're training a big regression, you'll probably get better results with the larger embedding.

We decided to err on the side of making the embeddings too big, because it's very easy to reduce the number of dimensions on the user's end, and impossible to increase it.