|
|
|
|
|
by potato3732842
394 days ago
|
|
> but what keeps e.g. Meta Inc. from training Llama to be ever so slightly more friendly and sympathetic to Meta Inc, or the tech industry in general? Even if there were something the natural incentive alignment is going to cause the AI to be trained to match what the company thinks is ok. A tech company full of techies is not going to take an AI trained to the point of saying things like "y'all are evil, your company is evil, your industry is evil" and push it to prod. |
|
And it might not be easy to fix. Despite all the effort invested into aligning models with company policy, persistent users can still get around the guardrails with clever jailbreaks.
In theory it should be possible to eliminate all non-compliant content from the training data, but that would most likely entail running all training data through an LLM, which would make the training process about twice as expensive.
So, in practice, companies have been releasing models that they do not have full control over.