Hacker News new | ask | show | jobs
by blueboo 975 days ago
My favourite extant example of this GPT4’s 18-way rule-based reward model classifier, appendix A https://cdn.openai.com/papers/gpt-4-system-card.pdf