Hacker News new | ask | show | jobs
by fredophile 455 days ago
Out of curiosity, what's the size of vectors you're using (# of dimensions) and what distance metric are you using? Euclidean?
1 comments

To optimize for fast nearest neighbors, I chose 256 dims. Notably, this actually hurt some of the pre-training classification losses pretty severely compared to 2k dims, so it definitely has a quality cost.

The site uses cosine distance. The code itself implements Euclidean distance, but I decided to normalize the vectors last minute out of FUD that some unusually small vectors would appear as neighbors for an abnormal number of examples.