Hacker News new | ask | show | jobs
by fredley 2733 days ago
Very interesting tool, although storing as typos does seem to be a bit visible and prone to mistaken 'correction'. Other approaches to consider might be:

* Changing punctuation for visually identical, but different characters. This would not work for printed documents however.

* Encoding only 'believable' typos, e.g. it's its. You could encode a binary stream across all instances of it(')s, or other substitutions.

* Encoding the stream in whitespace, e.g. Two/One spaces after a full stop. Printed documents would be lossy though (as full stops at line endings would be ambiguous). There are error detection/correction systems that can help though.

3 comments

Typical OCR errors would be interesting too: confusion between the letter "n" with the letters "ri" for example.

It would be visually challenging to detect (and also, maybe, difficult for an OCR engine).

Snow is interesting and uses white space instead. http://www.darkside.com.au/snow/
Yeah, I need to work on making the displacements and replacements a bit more context-aware (& probably linguistically aware). There are cases where it can "replace" a character with the same character, for example.

I do like your idea about visually similar but distinct character replacement. That would be a really fun one to implement.