| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by FeepingCreature 343 days ago

You just train it on the goal. Then it has that goal.

Alternately, you can train it on following a goal and then you have a system where you can specify a goal.

At sufficient scale, a model will already contain goal-following algorithms because those help predict the next token when the model is basetrained on goal-following entities, ie. humans. Goal-driven RL then brings those algorithms to prominence.

2 comments

kordlessagain 343 days ago

Random goal use is showing to be more important than training. Although, last year someone trained on the fly during the competition, which is pretty awesome when you think about it.

link

kelseyfrog 343 days ago

How do you figure goal generation and supervised goal training are interchangeable?

link

FeepingCreature 343 days ago

Layman warning! But "at sufficient scale", like with learning-to-learn, I'd expect it to pick up largely meta-patterns along with (if not rather than) behavioral habits, especially if the goal is left open, because strategies generalize across goals and thus get reinforcement from every instance of goal pursuit during base training.

But also my intuition is that humans are "trained on goals" and then reverse-engineer an explicit goal structure using self-observation and prosaic reasoning. If it works for us, why not the LLMs?

edit: Example: https://arxiv.org/abs/2501.11120 "Tell me about yourself: LLMs are aware of their learned behaviors". When you train a LLM on an exclusively implicit goal, the LLM explicitly realizes that it has been trained on this goal, indicating (IMO) that the implicit training hit explicit strategies.

link

kelseyfrog 342 days ago

I'm not sure. In my experience humans without explicit goal generation training tend to under perform at generating goals. In other words, our out-of-distribution performance for goal generation is poor.

Noticing this, frameworks like SMART[1], provide explicit generation rules. The existence of explicit frameworks is evidence that humans tend to perform worse than expected at extracting implicit structure from goals they've observed.

1. Independent of the effectiveness of such frameworks

link