Hacker News new | ask | show | jobs
by mrdrozdov 1290 days ago
This might provide some guidance: http://gltr.io/
1 comments

This and related techniques are trivially foolable by fine-tuning the model.

They're also trivially foolable by using sampling techniques or settings which encourage the model to generate rare words a lot.

Also foolable with filter-assisted decoding: https://paperswithcode.com/paper/most-language-models-can-be...