| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Sol- 205 days ago
	No, I think apparently it was used in the reinforcement learning step somehow to influence the model's final fine-tuning. At least how I understood it. The actual system prompt from Anthropic is shorter and also public on their website I believe

1 comments