Hacker News new | ask | show | jobs
by simonw 980 days ago
It's much more sophisticated than just OCR. The model was trained on images and text at the same time - it isn't processing images in a separate step.

The GPT-4 paper has a bunch more about this.