| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by causality0 1427 days ago
	So they're actually reading the texts and correcting the mistakes?

1 comments

acabal 1427 days ago

Yes - that's one of the main points of the project!

link

jxramos 1420 days ago

I'm curious what tooling folks use to accelerate this process, has anyone written custom GUI stuff like tesseract box editor?

link

baobabKoodaa 1426 days ago

Hmm, I'm fairly confident a large chunk of this work could be automated (correcting OCR errors). I would be happy to take a shot at this problem as a volunteer, if you're open to the idea?

link

hombre_fatal 1426 days ago

It’s not, because primary scans have arbitrary quality. Better OCR tech will spare you corrections but not from comparing the scan which is the big fixed cost whether it’s to correct 1000 errors or 10 errors.

link