|
|
|
|
|
by jasonwatkinspdx
1871 days ago
|
|
I did the same thing as you and had the exact same experience. S2 made the mapping trivial, and I spent nearly all time on the word list. I was really surprised to find there's not much out there in the way of cross language most commonly used word lists. I assume such lists are out there somewhere in the computational linguistics community but I couldn't find them. I ended up using a list of the most common english words, filtered via pairwise levenstien distance, and then I did a manual scan to drop any words that seemed problematic. It really would be nice if someone would solve this, but I'm not being flippant about just how much effort would be involved. |
|