To optimize for fast nearest neighbors, I chose 256 dims. Notably, this actually hurt some of the pre-training classification losses pretty severely compared to 2k dims, so it definitely has a quality cost.
The site uses cosine distance. The code itself implements Euclidean distance, but I decided to normalize the vectors last minute out of FUD that some unusually small vectors would appear as neighbors for an abnormal number of examples.
The site uses cosine distance. The code itself implements Euclidean distance, but I decided to normalize the vectors last minute out of FUD that some unusually small vectors would appear as neighbors for an abnormal number of examples.