| HN Mirror

I've thought about that problem a bit.

First of all, forensic linguistics is much much less powerful than it's made out to be. In particular there's no really good way to get a confidence estimate out of the prediction. All you can get is a likelihood ratio between N different suspects. You can't really get "definitely a match" vs "I don't know".

Anyway. The best solution would be to adopt some constraint, e.g. by being forced to write in haikus, being forced to write in "upgoer 5 style" Basic English, being forced to only write sentences that Google Translate stably round trips English<->French, etc.

The accuracy of a forensic linguistic algorithm trained on normal text and run on the stylistically constrained text is completely unknown. Hopefully this evidence would be inadmissible even by forensic science standards. Then again, maybe not.