|
|
|
|
|
by gitaarik
382 days ago
|
|
Yes ok, it can generate new stuff, but it's dependent on human curated reward models to score the output to make it usable. So it still depends on human thinking, it's own "thinking" is not sufficient. And there won't be a point when human curated reward models are not needed anymore. LLM's will make a lot of things easier for humans, because most of the thinking the humans do have been automated into the LLM. But ultimately you run into a limit where the human has to take over. |
|
This is a false premise, because there already exist systems, currently deployed, which are not dependent on human-curated reward models.
Refutations of your point include existing systems which generate a reward model based on some learned AI scoring function, allowing self-bootstrapping toward higher and higher levels.
A different refutation of your point is the existing simulation contexts, for example, by R1, in which coding compilation is used as a reward signal; here the reward model comes from a simulator, not a human.
> So it still depends on human thinking
Since your premise was false your corollary does not follow from it.
> And there won't be a point when human curated reward models are not needed anymore.
This is just a repetition of your previously false statement, not a new one. You're probably becoming increasingly overconfident by restating falsehoods in different words, potentially giving the impression you've made a more substantive argument than you really have.