Hacker News new | ask | show | jobs
by brucethemoose2 1068 days ago
There are tons of metrics people have come up with, for example look at the huggingface leaderboard. There are more niche leaderboards/tests for chat models, chain of thought, summarization and such.

But the best test is personal experimentation. Prompt engineering and subjective preference have a massive effect on finetune performance.