Y
Hacker News
new
|
ask
|
show
|
jobs
by
maronato
337 days ago
Or it was trained to be aligned with Musk by receiving higher rewards during reinforcement learning steps for its reasoning.