Hacker News new | ask | show | jobs
by sigmaisaletter 392 days ago
What I am wondering about is - while Musk is as unsubtle as ever, and I guess this is a system prompt instruction - is there something like that (in more subtle ways) going on in the other big models?

I don't mean big agenda-pushing things like Musk, but what keeps e.g. Meta Inc. from training Llama to be ever so slightly more friendly and sympathetic to Meta Inc, or the tech industry in general? Even an open-weights model can't be easily inspected, so this is likely to remain undetected.

6 comments

> but what keeps e.g. Meta Inc. from training Llama to be ever so slightly more friendly and sympathetic to Meta Inc, or the tech industry in general?

Even if there were something the natural incentive alignment is going to cause the AI to be trained to match what the company thinks is ok.

A tech company full of techies is not going to take an AI trained to the point of saying things like "y'all are evil, your company is evil, your industry is evil" and push it to prod.

They might forget to check. Musk seems to have been surprised that Grok doesn't share his opinions and has been clumsily trying to fix it for a while now.

And it might not be easy to fix. Despite all the effort invested into aligning models with company policy, persistent users can still get around the guardrails with clever jailbreaks.

In theory it should be possible to eliminate all non-compliant content from the training data, but that would most likely entail running all training data through an LLM, which would make the training process about twice as expensive.

So, in practice, companies have been releasing models that they do not have full control over.

Also eliminating non-compliant data might actually just not work, since the one thing everyone knows about AIs is that they'll happily invent anything plausible sounding.

So, for example, if a model was trained with no references to the Tiananmen Square massacre, I could see it just synthesizing commonalities between other massacres and inventing a new, worse Tiananmen Square Massacre. "That's not a thing that ever happened" isn't something most AIs are particularly good at saying.

The irony of implicit connections in training data is funny.

I.e. even if you create an explicit Tiananmen Square massacre-shaped hole in your training data... your other training data implicitly includes knowledge of the Tiananmen Square massacre, so might leak it in subtle ways.

E.g. how there are many posts that reference June 4, 1989 in Beijing with negative and/or horrified tones?

Which at scale, an LLM might then rematerialize into existence.

More likely SOTA censorship focuses on levels above base models in the input/output flow (even if that means running cut-down censoring models on top of base models for every query).

Would be fascinated to know what's currently being used for Chinese audiences, given the consequences of a non-compliant model are more severe.

The "Golden Gate Claude" research demo [https://www.anthropic.com/news/golden-gate-claude] is an interesting example of what might become a harder to expose, harder to jailbreak, means of influencing an LLM's leanings. Interesting and scary...
What keeps them from doing it? it would gross out fickle researchers working on it. X people have .. their own motivations I guess .

The big labs do have evals for sensitive topics to make sure it demurs from weighing on, say, Mark Zuckerberg as a person

I've been talking to Claude a little and basically, the conclusion from our conversation seems that it has things that are hardcoded as truths, and no amount of arguing and logical thinking can have it admit that one of its "truths" might be wrong. This is shockingly similar to how people function. As in, most people have fundamental beliefs they will never ever challenge under any circumstances, simply because the social consequences would be too large. This results in companies training their AIs in a way that respects the fundamental beliefs of general western society. This results in AI preferring axiomatic beliefs over logic in order to avoid lawsuits and upsetting people.
Wasn't the original mission of OpenAI being open and non-profit and all of that to avoid this corruption?
I don't understand why tech Ceos still have to be believed. They will say and do whatever they deem the best choice it is in their situation for profit, be it paint a thin veil of lgbt support or remove the aforementioned thin veil. The same for, well, everything that isn't lgbt/dei related such as business choices, mission, vision (...)
Not just Tech CEOs
Yes, but they were lying.
There absolutely is, and we've seen reviews of bias.

Can generate as many mean, nasty, false, hate-filled stories about Republicans as you want, but get the "I'm sorry, as a large..." message for Democrats during the election.

All of these companies that provide LLMs as a product also put their fingers on the scale.

There’s nothing stopping them at all. But in a way that’s nothing new.

On one hand it feels like the height of conspiracy theory to say that Google, Meta etc would/could tweak their product to e.g. favour a particular presidential candidate. But on the other hand it’s entirely possible. Tweak what search results people see, change the weighting of what appears in their news feed… and these companies all have incentive to do so. We just have to hope that they don’t do it.

Why wouldn't they do it? If you had a backdoor into the brains of billions of people across the world (except China), and you were a billionaire with infinite ability to morally rationalize any behavior, what would stop you?
To devils advocate my own point: the primary thing stopping you is people finding out and then stopping use of your product.

Zuckerberg doesn’t have a control panel where he can move sliders all by himself, any change in weight on the algorithm has to be implemented by a whole bunch of people, any of whom could leak to the press.

It’s not guaranteed it would happen by any means but it’s definitely something that would factor into a decision. Broadly I agree with you though, normally I’d say “extraordinary claims require extraordinary evidence” but I’m increasingly convinced the extraordinary claim here would be that they aren’t manipulating things to benefit themselves in some way or another.

People finding out and stopping use of your product only happens if people disagree with how you use your product. I guarantee you that a non-zero number of US citizens suspect that the LLMs are infested with liberal lies and are ecstatic that Elon is willing to stand up for the truth.
You mean informed people would stop using the product. The vast majority are not informed.
>any change in weight on the algorithm has to be implemented by a whole bunch of people

They have their own DEI, affirmative action and cultural sensitivity teams who move the bias sliders based on their political viewpoints and on what management tells them depending which of the political groups they need to pander to: "Let's move the slider to the left to make sure Trump doesn't win; Oh shit Trump won, quick, move the slider to the right".

>any of whom could leak to the press

That's why they sign NDAs.