Hacker News new | ask | show | jobs
by jasonwatkinspdx 1871 days ago
I did the same thing as you and had the exact same experience. S2 made the mapping trivial, and I spent nearly all time on the word list.

I was really surprised to find there's not much out there in the way of cross language most commonly used word lists. I assume such lists are out there somewhere in the computational linguistics community but I couldn't find them. I ended up using a list of the most common english words, filtered via pairwise levenstien distance, and then I did a manual scan to drop any words that seemed problematic.

It really would be nice if someone would solve this, but I'm not being flippant about just how much effort would be involved.

1 comments

Is there somewhere I can see the wordlist you came up with? My wordlist experiments are mostly here: https://github.com/kybernetikos/wherewords/tree/main/lib/wor...
Sorry no. I never got around to putting the code up on github and that laptop died. My list wasn't great either anyhow. This is an annoyingly tough problem.