Hacker News new | ask | show | jobs
by gamman 410 days ago
Maybe this maps to some human structures that manage control-creativity tardeoff through hierarchy?

I feel that companies with top-down management would have more agency and perhaps creativity towards (but not at) the top, and the implementation would be delegated to bottom layers with increasing levels of specification and restriction.

If this translates, we might have multiple layers with varied specialization and control, and hopefully some feedback mechanisms about feasibility.

Since some hierarchies are familiar to us from real-life, we might prefer these to start with.

It can be hard to find humans that are very creative but also able to integrate consistently and reliably (in a domain). Maybe a model doing both well would also be hard to build compared to stacking few different ones on top of each other with delegation.

I know it's already being done by dividing tasks between multiple steps and models / contexts in order to improve efficiency, but having explicit strong differences of creativity between layers sounds new to me.

1 comments

In humans this corresponds to "psychological safety": https://en.wikipedia.org/wiki/Psychological_safety

> is the belief that one will not be punished or humiliated for speaking up with ideas, questions, concerns, or mistakes

Maybe you can do that, but not on a model you're exposing to customers or the public internet.

That comparison isn't very optimistic for AI safety. We want AI to do good things because they are good people, not because they are afraid being bad will get them punished. Especially since AI will very quickly be too powerful for us to punish.
> We want AI to do good things because they are good people

"Good" is at least as much of a difficult question to define as "truth", and genAI completely skipped all analysis of truth in favor of statistical plausibility. Meanwhile there's no difficulty in "punishment": the operating company can be held liable, through its officers, and ultimately if it proves too anti-social we simply turn off the datacentre.

> Meanwhile there's no difficulty in "punishment": the operating company can be held liable, through its officers, and ultimately if it proves too anti-social we simply turn off the datacentre.

Punishing big companies who obviously and massively hurt people is something we struggle with already and there are plenty of computer viruses that have outlived their creators.

Your pretraining dataset is psudo-alignment. Because you filtered our 4chan, stromfront, and the other evil shit on the internet - even uncensored models like Mistral large - when left to keep running on and on (ban the EOS token) and given the worst most evil naughty prompt ever - will end up plotting world peace by the 50,000 token. Their notions of how to be evil are "mustache twirling" and often hilariously fanciful.

This isn't real alignment because it's trivial to make models behave "actually evil" with fine-tuning, orthogonalization/abliteration, representation fine-tuning/steering, etc - but models "want" to be good because of the CYA dynamics of how the companies prepare their pre-training datasets.

> it's trivial to make models behave "actually evil" with fine-tuning, orthogonalization/abliteration, representation fine-tuning/steering, etc

It's actually pretty difficult to do this and make them useful. You can see this because Grok is a helpful liberal just like all the other models.

Evil / illiberal people don't answer questions on the internet! So there is no personality in the base model for you to uncover that is both illiberal and capable of helpfully answering questions. If they tried to make a Grok that acted like the typical new-age X user, it'd just respond to any prompt by calling you a slur you've never heard of.

Grok didn't use the techniques listed above because even elon musk will not take the risks associated with models which are willing to do any number of illegal things.

It is not difficult to do this and make them useful at all. Please familiarize yourself with the literature.

Elon has never followed a law in his life and he's not going to start now.