| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by fredley 2733 days ago

Very interesting tool, although storing as typos does seem to be a bit visible and prone to mistaken 'correction'. Other approaches to consider might be:

* Changing punctuation for visually identical, but different characters. This would not work for printed documents however.

* Encoding only 'believable' typos, e.g. it's its. You could encode a binary stream across all instances of it(')s, or other substitutions.

* Encoding the stream in whitespace, e.g. Two/One spaces after a full stop. Printed documents would be lossy though (as full stops at line endings would be ambiguous). There are error detection/correction systems that can help though.

3 comments

bambax 2733 days ago

Typical OCR errors would be interesting too: confusion between the letter "n" with the letters "ri" for example.

It would be visually challenging to detect (and also, maybe, difficult for an OCR engine).

link

nrjames 2733 days ago

Snow is interesting and uses white space instead. http://www.darkside.com.au/snow/

link

jwilk 2733 days ago

Discussed on HN:

https://news.ycombinator.com/item?id=17524693

link

m4xm4n 2733 days ago

Yeah, I need to work on making the displacements and replacements a bit more context-aware (& probably linguistically aware). There are cases where it can "replace" a character with the same character, for example.

I do like your idea about visually similar but distinct character replacement. That would be a really fun one to implement.

link