Hacker News new | ask | show | jobs
by greenkey 2025 days ago
That’s awesome!

In comparison, here a quote from the OP’s blog entry:

“Fast forward to today. A program to load /usr/share/dict/words into a hash table is 3-5 lines of Perl or Python, depending on how terse you mind being. Looking up a word in this hash table dictionary is a trivial expression, one built into the language. And that's it. Sure, you could come up with some ways to decrease the load time or reduce the memory footprint, but that's icing and likely won't be needed. The basic implementation is so mindlessly trivial that it could be an exercise for the reader in an early chapter of any Python tutorial.

That's progress.”

But is a simpler, less efficient method progress? Sure it allows more words to be added/removed with ease, and I don’t want to advocate over-optimization, but the solution you made for the Spectrum seems better because words don’t change much. Why don’t we use a similar specialized hash and compressed dictionary format to increase spellchecking speed and allow more words in less space? We could still produce that format using /usr/share/dict/words and similar.

2 comments

> Why don’t we

Because we don't need to and we have much more interesting problems to take up our time.

But GP already solved the problem (at least for English and other Latin script languages). Why throw away those findings?
Problems tend to have more than one solution. GP's solution should be documented, yes, but the alternate solution that won out was computers being capable of storing a million words or so in plaintext very easily, and doing the same using their compression scheme just isn't really worth the space saved nowadays.
Also, compressing could actually be slower for modern computers. Remember when compressing your hard disk made your PC faster, up until disks became faster, then it actually made it slower?

Today's CPUs are very fast, so the trend could have flipped again, that would be an interesting benchmark.

Implementation takes time. Keep it simple and move on.
I always thought that we still use a trie or (to save memory) ternary search trees for that..
How many operations and objects? The method he’s talking about would seem more efficient for the purpose vs all of the strings still being created even if never used in the plain hash version.