|
|
|
|
|
by infecto
878 days ago
|
|
Not OP but gpt is multimodal and does an okay enough job for something like this workflow. Alternatively all the big cloud players and ocr/ml products to read receipts. You could stitch all of this together for this product in a day or two. |
|
I see a lot of examples that use receipts themselves like this, but one idea I had that's kind of similar would be to look at just the "ingredients" list on product labels and parse those, like these examples (under the nutrition labels):
Example 1: https://i.imgur.com/MqpL6yh.png Example 2: https://i.imgur.com/3FSK0CD.png
However, using things like pytesseract and Google's Cloud Vision API returns mixed results, sometimes missing things, transposing lines, etc.
Any ideas on what I could do to improve being able to extract ingredients lists from food labels? Would I have to start looking into something like Vertex AI and training custom models?
Then again, as I'm thinking out loud, I realized if these tools can extract all the text pretty reliably, the order and place doesn't really matter if you create some extractor that's able to just pluck out which words are actual "ingredients" based on some master list or something.