|
|
|
|
|
by langcss
636 days ago
|
|
You need a lot more power. I found gpt4o struggles doing basic OCR of printed text by hallucinating alot, while tesseract engine (old skool) gets it perfect. You need the model to be powerful enough to do everything. You can work around this by the way by sending the output through a checking stage. So picture -> gpt4o -> out1, picture -> tesseract -> out2, out1,out2 -> llm. Might work for sound too. |
|