|
|
|
|
|
by yorwba
298 days ago
|
|
I don't think Mixture of Logits from the paper you link circumvents the theoretical limitations pointed out here, since their dataset size mostly stays well below the limit. In the end they still rely on Maximum Inner Product Search, just with several lookups for smaller partitions of the full embedding, and the largest dataset is Books, where this paper suggests you'd need more than 512 embedding dimensions, and MoL with 256-dimensional embeddings split into 8 parts of 32 each has an abysmal hit rate. So that's hardly a demonstration that arbitrary high-rank distributions can be approximated well. MoL seems to approximate it better than other approaches, but all of them are clearly hampered by the small embedding size. |
|