Hacker News new | ask | show | jobs
by duskwuff 897 days ago
I don't think that's quite what the parent had in mind.

The most natural application of a language model in proofreading is to compute perplexity across the text; if all goes well, errors should be detectable as points of unusually high perplexity. (In principle, this should even be able to spot otherwise undetectable errors like missing words.)

1 comments

I could see how that would be helpful, but at least for my use case I'm more interested in seeing how LLMs integrated with computer vision can speed up transcriptions. Since a thorough proofread by a human is already baked into the SE production process (and is indeed one of the major selling points), having more automated tools to aid proofreading is nice but doesn't do anything fundamentally different, from my point of view. Whereas if LLMs can be leveraged for transcription SE producers no longer need to depend on external projects like Project Gutenberg or Wikisource to produce texts (which can take months) or transcribe texts from OCR results by hand (very tedious and error-prone--believe me, I'm speaking from experience!). It would drastically open up the range of possible books someone could reasonably produce (in a timely fashion) for SE.