|
|
|
|
|
by yorwba
397 days ago
|
|
They might forget to check. Musk seems to have been surprised that Grok doesn't share his opinions and has been clumsily trying to fix it for a while now. And it might not be easy to fix. Despite all the effort invested into aligning models with company policy, persistent users can still get around the guardrails with clever jailbreaks. In theory it should be possible to eliminate all non-compliant content from the training data, but that would most likely entail running all training data through an LLM, which would make the training process about twice as expensive. So, in practice, companies have been releasing models that they do not have full control over. |
|
So, for example, if a model was trained with no references to the Tiananmen Square massacre, I could see it just synthesizing commonalities between other massacres and inventing a new, worse Tiananmen Square Massacre. "That's not a thing that ever happened" isn't something most AIs are particularly good at saying.