| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by infecto 882 days ago
	Yes I think using a flavor of the cloud providers document tooling is probably optimal. Gpt vision is great for general recognition but I found it to be hit or miss when you started throwing too much text at it in an image. If you can get image working via vision that’s great. On the cloud ocr side, I know tooling like Textract is good enough to generally provide output as if you were reading left to right. So in theory the text should not be that transposed or fragmented and nutrition labels are standard enough that you can probably pull the portion you want. On top of that, like you allude to, LLMs are pretty good and figuring things out.