Hacker News new | ask | show | jobs
by spdustin 577 days ago
My answer to this in my own pet project is to mask terms found by the NER pipeline from being corrected, replacing them with their entity type as a special token (e.g. [male person] or [commercial entity]). That alone dramatically improved grammar/spelling correction, especially because the grammatical "gist" of those masked words is preserved in the text presented to the LLM for "correction".