Hacker News new | ask | show | jobs
by defrost 543 days ago
Human or LLM the trick with messy inputs from scanned sources is having robust sanity combs that look for obvious fubar's and a means by which end data users can review the asserted values and the original raw image sources (and flag for review | alteration).

At least in my past experience with volumes of transcribed data for applications that are picky about accuracy.