Hacker News new | ask | show | jobs
by minimaxir 1172 days ago
A larger vocab takes longer to train but has no (practical) impact at inference time as an Embeddings index is just a key-value store, which is very helpful as GPT starts hitting scaling laws.