|
|
|
|
|
by dmurray
317 days ago
|
|
For the 800 names that were missing declension data in the database, it seems like the most straightforward thing to do would be to assign their declensions by hand. It shouldn't take a native speaker more than a couple of hours (if some name they haven't seen before is ambiguous, then whatever they guess at least won't sound obviously wrong to other native speakers). Alternatively, very cheap to ask an LLM to do it. Encoding them into a trie like this would still be a good way to distribute the result, but you don't have to rely on the trie also being a good way to guess the declensions. |
|
I would not be confident enough myself to add the data myself since I'd probably be wrong a lot of the time. When reviewing the results for the top 100 unknown names I frequently got results that I thought _might_ be wrong, but I wasn't sure. For those, I looked up similar names in DIM to verify, and often thought "huh, I would not have declined those names like this". For that reason, I rely on the DIM data as the source of truth since it's maintained by experts on the language.