Hacker News new | ask | show | jobs
by pjc50 353 days ago
I'd still like people to be more rigorous about what the mean by "alignment", since it seems to be some sort of vague "don't be evil" intention and the more important ground truth problem isn't solved (solvable?) for language models.
1 comments

Originally, alignment was and is a technical term in academic research on how to make sure that a theoretic artificial superintelligence would value what humans value (see Nick Bostrom's Superintelligence). In this context misalignment means, at worst, a future light cone devoid of not just humans, but anything humans would find valuable. A paperclip maximizer scenario, in short. Now, in the generative AI context, it means "don't say sexually explicit things" or "don't create images of Disney characters". One of these problems is not like the other.
> Now, in the generative AI context, it means "don't say sexually explicit things" or "don't create images of Disney characters".

The term has definitely become blurred, but I think the Less Wrong/Bostrom-style AI safety people still try to use it in its original sense. Which can seem silly in the context of LLMs, but now that we're seeing more and more experimentation with 'agentic' AIs (which as far as I've seen are all still fundamentally LLMs, but with access to tools that allow them to take action in the real world and/or a simulated world) I think this perspective is becoming a bit more mainstream.

(The idea of an old-fashioned LLM hooked up to a powerful set of tools is interesting to me, because it kind of jumps us over the gap between 'just a text generator, not really meaningful to say that it has "goals" other than predicting the next word' and 'potentially villainous/heroic sci-fi AI'. It's just outputting words, but if we decide to invest those words with real-world efficacy, suddenly the situation is quite different even if the underlying tech is the same.)