| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tarsinge 77 days ago
	To me it was already quite intuitive, we are not really managing the psychological state: at its core a LLM try to make the concatenation of your input + its generated output the more similar it can with what it has been trained on. I think it’s quite rare in the LLMs training set to have examples of well thought professional solution in a hackish and urgency context.

1 comments

astrange 77 days ago

No, that's how base model pretraining works. Claude's behavior is more based on its constitution and RLVR feedback, because that's the most recent thing that happened to it.

link