Hacker News new | ask | show | jobs
by matusp 180 days ago
AI alignment is not a solved problem by any means. As long as LLMs hallucinate, they cannot be considered aligned. You can only be aligned if you have a zero probability of generating hallucinations. The two problems, alignment and hallucinations, can be considered equivalent.
1 comments

A human who hates maths is different from one who adds up wrong because they think the first digit counts units, second digit how many tens, third digit how many twenties (as one of my uni lecturers recounted of her own childhood).

Alignment is, approximately, "are we even training this AI on the correct utility function?" followed up by the second question "even if we specified the correct utility function, did the AI learn a representation of that function or some weird approximation of that function with edge cases we've not figured out how to spot?"

With, e.g. RLHF, the first is "is optimising for thumbs-up/thumbs-down the right objective at all?", the second is "did it learn the preference, or just how to game the reward?"