Hacker News new | ask | show | jobs
by zerop 641 days ago
I had been using GPT4o for extracting insights from Scanned docs, it was doing fine. But very recently (since they launched new model - o1), it's not working. GPT4o is refusing to extract text from images and says it can't do it, though it was doing same thing with same prompts till last week. I am not sure if this is intentional downgrade and it can be clubbed with new model launch, but it's really frustrating for me. I cancelled my GPT4 premium and moved to claude. It works good.
5 comments

This. Inconsistency is a big problem for large tasks, you are better off making your own models to do this.

I have seen this odd kind of inconsistency in generating the same results, sometimes in the same chat itself after starting off fine.

I was once trying to extract hand written dates and times from a large pdf document in batches of 10 pages at a time from a very specific part of the page. IN some documents it started by refusing, but not in other different chat windows that I tried with the same document. Sometimes it would say there is an error, and then it would work in a new chat window. But I am not sure why, but just starting a new chat works for these kind of situations.

Sometimes it will start off fine with OCR, then as the task progresses, it will start hallucinating. Even though the text to be extracted follows a pattern like dates, it for the life of me could not get it right.

> "...you are better off making your own models to do this"

I'm doubtful you meant what you wrote here. Using a readymade UI or API to perform an effectively magical task (for most of us) is an entirely different paradigm to "just train your own model."

In reality, for us non-ML model training mortals, we're actually probably better off hiring a human to do basic data entry.

Have you tried few shot prompting? Something on the lines of:

User: Extract x from the given scanned document. <sample_img_1>

Assistant: <sample_img_1_output>

User: Extract x from the given scanned document. <sample_img_2>

Assistant: <sample_img_2_output>

User: Extract x from the given scanned document. <query_image>

In my experience, this seems to make the model significantly more consistent.

For highly consistent responses, manually transcribing the most challenging page of the document (or engaging in multiple rounds of dialogue with Claude) and incorporating it as a few-shot example can dramatically improve overall consistency.
I ought to test this with Sonnet too and compare the results. I feel it might perform better on OCR tasks. While I went with Azure OpenAI due to fewer rate restrictions, you've got a point - Sonnet could really shine here.
I have observed a lot of similar contradictions, where the large lout insists it can't do something that it did many times 'last week'.

Super frustrating when really trying to accomplish something!

Why not just switch back to GPT-4? it's still there.
So is 4o. Problem isn't the absence of model, it's inconsistency.