Hacker News new | ask | show | jobs
by tarsinge 77 days ago
To me it was already quite intuitive, we are not really managing the psychological state: at its core a LLM try to make the concatenation of your input + its generated output the more similar it can with what it has been trained on. I think it’s quite rare in the LLMs training set to have examples of well thought professional solution in a hackish and urgency context.
1 comments

No, that's how base model pretraining works. Claude's behavior is more based on its constitution and RLVR feedback, because that's the most recent thing that happened to it.