|
|
|
|
|
by unixpickle
453 days ago
|
|
To optimize for fast nearest neighbors, I chose 256 dims. Notably, this actually hurt some of the pre-training classification losses pretty severely compared to 2k dims, so it definitely has a quality cost. The site uses cosine distance. The code itself implements Euclidean distance, but I decided to normalize the vectors last minute out of FUD that some unusually small vectors would appear as neighbors for an abnormal number of examples. |
|