Hacker News new | ask | show | jobs
by maronato 337 days ago
Or it was trained to be aligned with Musk by receiving higher rewards during reinforcement learning steps for its reasoning.