Hacker News new | ask | show | jobs
by Palmik 505 days ago
Anthropic is, according to themselves, using RLAIF... which is basically using LLM as a judge / reward model. So maybe he means that the models they use for RLAIF are not (much?) more expensive than Sonnet 3.5 (e.g. previous Sonnet or Haiku 3 :)).
1 comments

Do you have a link to Anthropic saying they use RLAIF?