Y
Hacker News
new
|
ask
|
show
|
jobs
by
Kiro
591 days ago
Why all those steps? Why not just file + prompt to JSON directly?
1 comments
tlofreso
591 days ago
Having the text (for now) is still pretty important for quality output. The vision models are quite good, but not a replacement for a quality OCR step. A combination of Text + Vision is compelling too.
link