|
|
|
|
|
by f311a
323 days ago
|
|
You can probably fit all words under 10-15MB of memory, but memory optimisations are not even needed for 250k words... Trie data structures are memory-efficient for storing such dictionaries (2-4x better than hashmaps). Although not as fast as hashmaps for retrieving items. You can hash the top 1k of the most common words and check the rest using a trie. The most CPU-intensive task here is text tokenizing, but there are a ton of optimized options developed by orgs that work on LLMs. |
|