Hacker News new | ask | show | jobs
by jondwillis 304 days ago
We’re already steering, during pre-training (e.g. reasoning RLHF), as well as test-time (structured outputs, tool calls, agents…)