Y
Hacker News
new
|
ask
|
show
|
jobs
by
simonw
980 days ago
It's much more sophisticated than just OCR. The model was trained on images and text at the same time - it isn't processing images in a separate step.
The GPT-4 paper has a bunch more about this.