|
|
|
|
|
by ses425500000
447 days ago
|
|
Great question ā Iām using traditional OCR engines for the initial text extraction (e.g., MathPix, Google Vision), but then I apply generative AI models in a second stage to refine the output. This includes removing noisy or irrelevant elements, normalizing format inconsistencies, and improving alignment across multi-modal inputs. In addition, for figures and diagrams, I use Gemini Pro Vision not just to extract the content, but to generate context-aware, structured descriptions that are better suited as ML training input ā rather than just dumping raw image text. So in short, generative AI is used here more as a smart post-processing layer to enhance the usability and semantic clarity of the OCR outputs. |
|