Hacker News new | ask | show | jobs
by theaussiestew 816 days ago
It's because LMSYS is an aggregate elo across a range of different tasks. Individually in some very important areas, Claude Opus may be better than GPT-4 by 50-100 elo points which is quite a lot. However there are specific domains where GPT-4 has the advantage because it's been fine tuned based off a lot of existing usage. So weak points around logic puzzles or specific instructions don't bring down its elo whereas Claude Opus doesn't have this advantage yet. I believe Opus's eventual elo, after all these little areas of weakness are fine tuned, will be something like 1300.