Hacker News new | ask | show | jobs
by circuit10 1101 days ago
Oh, sorry, I’m not that familiar with the terminology (I still feel like my argument is valid despite me not being an expert though because I heard all this from people who know a lot more than me about it). One problem with that kind of feedback is that it incentives the AI to make us think it solved the problem when it didn’t, for example by hallucinating convincing information. That means it specifically learns how to lie to us so it doesn’t really help

Also I guess giving feedback is sort of like babysitting, but I did interpret it the wrong way

1 comments

> One problem with that kind of feedback is that it incentives the AI to make us think it solved the problem when it didn’t,

Supervised learning is: "here's the task" … "here's the expected solution" *adjusts model parameters to bring it closer to the expected solution*.

What you're describing is specification hacking, which only occurs in a different kind of AI system: https://vkrakovna.wordpress.com/2018/04/02/specification-gam... In theory, it could occur with feedback-based fine-tuning, but I doubt it'd result in anything impressive happening.

Oh, that seems less problematic (though not completely free of problems), but also less powerful because it can’t really exceed human performance