| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by manishsharan 980 days ago

The author mentions that GPT-4 is so good at Optical Character Recognition (OCR)

My experience has been the opposite: I was trying to get it to read an image of a data table with header and the usual excel table color palette . It could not read most of the data. Then I tried similar read experiment with Enterprise architecture diagrams saved as png files ... same issue as it missed most of the data.

I am not disputing the author .. I am trying to figure out what I am doing wrong.

3 comments

TeMPOraL 980 days ago

Surprising. I tried OCR only once so far - I took a photo of a hand-drawn poster at my kid's kindergarten, about mental health, dense with hand-written-like text mixed up with various drawings. You know, the kind of hand-made infographic. And the text was 100% in Polish. I figured it's a good test as any - I fed that photo to ChatGPT and asked to summarize it. To my astonishment, it reproduced 100% of the content correctly, and even in the right order (i.e. how I'd read it myself, vs. strict left-right top-down).

I don't know which blows my mind more - the above feat done on first try, or that the "voice chat mode" has unprecedented ability to correctly pick up on and transcribe what I'm saying. The error rate on this (tested both in English and Polish) is less than 5% - and that's with me walking outside, near a busy road, and mistakes it made were on words I know I pronounced somewhat unclearly. Compare that to voice assistants like Google one, which has error rate near 50%, making it entirely useless for me. I don't know how OpenAI is doing it, but I'd happily pay the API rates for GPT-4 voice powered phone assistant, because that would actually work.

link

bytefactory 980 days ago

A GPT-4 powered assistant for Android would be a game changer

link

SkalskiP 980 days ago

Hi! I'm the author. :) I can agree I had problems with tables as well. I tried crosswords and sudoku. My assumption is that it does not work well when it needs to position the text in the spatial context of table or grid. I found BARD to work a lot better with those examples.

I found it to work really well with weirdly positioned text. Like serial number on tire.

link

M4v3R 980 days ago

How are you prompting it to extract the data?

link

manishsharan 980 days ago

The png was a picture of a rate card . My was asking to list the column headers. This was a shaded row (typical excel table header) and then create a csv table based on the table data

link

SkalskiP 980 days ago

You are asking in the context of this blogpost?

link