Hacker News new | ask | show | jobs
by manishsharan 980 days ago
The author mentions that GPT-4 is so good at Optical Character Recognition (OCR)

My experience has been the opposite: I was trying to get it to read an image of a data table with header and the usual excel table color palette . It could not read most of the data. Then I tried similar read experiment with Enterprise architecture diagrams saved as png files ... same issue as it missed most of the data.

I am not disputing the author .. I am trying to figure out what I am doing wrong.

3 comments

Surprising. I tried OCR only once so far - I took a photo of a hand-drawn poster at my kid's kindergarten, about mental health, dense with hand-written-like text mixed up with various drawings. You know, the kind of hand-made infographic. And the text was 100% in Polish. I figured it's a good test as any - I fed that photo to ChatGPT and asked to summarize it. To my astonishment, it reproduced 100% of the content correctly, and even in the right order (i.e. how I'd read it myself, vs. strict left-right top-down).

I don't know which blows my mind more - the above feat done on first try, or that the "voice chat mode" has unprecedented ability to correctly pick up on and transcribe what I'm saying. The error rate on this (tested both in English and Polish) is less than 5% - and that's with me walking outside, near a busy road, and mistakes it made were on words I know I pronounced somewhat unclearly. Compare that to voice assistants like Google one, which has error rate near 50%, making it entirely useless for me. I don't know how OpenAI is doing it, but I'd happily pay the API rates for GPT-4 voice powered phone assistant, because that would actually work.

A GPT-4 powered assistant for Android would be a game changer
Hi! I'm the author. :) I can agree I had problems with tables as well. I tried crosswords and sudoku. My assumption is that it does not work well when it needs to position the text in the spatial context of table or grid. I found BARD to work a lot better with those examples.

I found it to work really well with weirdly positioned text. Like serial number on tire.

How are you prompting it to extract the data?
The png was a picture of a rate card . My was asking to list the column headers. This was a shaded row (typical excel table header) and then create a csv table based on the table data
You are asking in the context of this blogpost?