|
|
|
|
|
by philsnow
2758 days ago
|
|
If you're running tesseract locally (i.e. not paying per invocation), run it once with EN and count occurrences of the/this/a/any etc, run it again with DE and count occurrences of der/die/das/um/ab/wie, and go from there? Edit: Hell, even average word length is probably going to be a good indicator since German is so agglutinative. Collect some factors like this and I think you'll be able to build a pretty good classifier. |
|