| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mrdrozdov 1290 days ago
	This might provide some guidance: http://gltr.io/

1 comments

Der_Einzige 1290 days ago

This and related techniques are trivially foolable by fine-tuning the model.

They're also trivially foolable by using sampling techniques or settings which encourage the model to generate rare words a lot.

Also foolable with filter-assisted decoding: https://paperswithcode.com/paper/most-language-models-can-be...