| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ematvey 3446 days ago

Nice work! I was playing with exactly this idea for some time. Potentially it could be way bigger than simple grammatical corrections.

My list of things to try, in addition to what you've already done:

- replacing named entities with metadata-annotated tokens;

- dropping random words, not just articles;

- replacing random words with rarer synonyms;

- annotate with POS tags from some external parser;

- run syntax corrector before feeding sentences in grammatical model;

I think this problem is easier that it appears on the surface. Generated deformation does not have to be a perfect replica of typical human errors. It just have to be sufficiently diverse.

Also, I think seq2seq module is getting deprecated, as it doesn't do dynamic rollouts.