Hacker News new | ask | show | jobs
by umajho 306 days ago
This makes me wonder, if a model is fine-tuned for misalignment this way using only English text, will it also exhibit similar behaviors in other languages?