| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Palmik 505 days ago
	Anthropic is, according to themselves, using RLAIF... which is basically using LLM as a judge / reward model. So maybe he means that the models they use for RLAIF are not (much?) more expensive than Sonnet 3.5 (e.g. previous Sonnet or Haiku 3 :)).

1 comments

Do you have a link to Anthropic saying they use RLAIF?