|
|
|
|
|
by ocrcustomserver
2764 days ago
|
|
In tesseract, if you want to recognize both English and German you can use option -l deu+eng. If you want to perform language detection you can do the following: a. Invoke tesseract with "-l eng". b. Pass the output text to langdetect [1]. It is a port of Google's language detection library to Python which will give you the probabilities of the languages for a given text. c. Invoke tesseract with "-l langdetect_output" Note that langdetect generates 2 character codes (ISO 639-1) whereas tesseract expects 3 character codes (ISO 639-2). [1]: https://github.com/Mimino666/langdetect |
|