| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rob 923 days ago
	This is amazing, what kind of OCR are you using for the receipts? Or are you doing something else?

1 comments

infecto 923 days ago

Not OP but gpt is multimodal and does an okay enough job for something like this workflow. Alternatively all the big cloud players and ocr/ml products to read receipts.

You could stitch all of this together for this product in a day or two.

link

rob 923 days ago

Thanks! Yeah, I attached an image to ChatGPT and it worked great.

I see a lot of examples that use receipts themselves like this, but one idea I had that's kind of similar would be to look at just the "ingredients" list on product labels and parse those, like these examples (under the nutrition labels):

Example 1: https://i.imgur.com/MqpL6yh.png Example 2: https://i.imgur.com/3FSK0CD.png

However, using things like pytesseract and Google's Cloud Vision API returns mixed results, sometimes missing things, transposing lines, etc.

Any ideas on what I could do to improve being able to extract ingredients lists from food labels? Would I have to start looking into something like Vertex AI and training custom models?

Then again, as I'm thinking out loud, I realized if these tools can extract all the text pretty reliably, the order and place doesn't really matter if you create some extractor that's able to just pluck out which words are actual "ingredients" based on some master list or something.

link

infecto 923 days ago

Yes I think using a flavor of the cloud providers document tooling is probably optimal. Gpt vision is great for general recognition but I found it to be hit or miss when you started throwing too much text at it in an image.

If you can get image working via vision that’s great. On the cloud ocr side, I know tooling like Textract is good enough to generally provide output as if you were reading left to right. So in theory the text should not be that transposed or fragmented and nutrition labels are standard enough that you can probably pull the portion you want. On top of that, like you allude to, LLMs are pretty good and figuring things out.

link

elchead 922 days ago

I was also playing with Cloud Vision API, and it was good but found the formatting not always easily usable. GPT vision API is great for general recognition

link