Hacker News new | ask | show | jobs
by not-my-account 764 days ago
It must. IIRC Anthropic has a 'red team' of sorts. I wonder what they can do with this technique? What are the limits of "evil" of these current models?
1 comments

It's good if you really don't want your LLM to mention specific things, which I can see some groups wanting. Having it mention some things even when they're not related could be good for integrated ads in a chatbot, which sounds evil in that it would be really annoying. Your friend's account gets hacked, a chatbot LLM is finetuned on their message history, it's able to carry on a conversation while slipping in a mention of Joe's Hot Dogs every now and then.

It probably also could help with consistency when trying to do LangChain-type stuff.