|
|
|
|
|
by brucethemoose2
1068 days ago
|
|
There are tons of metrics people have come up with, for example look at the huggingface leaderboard. There are more niche leaderboards/tests for chat models, chain of thought, summarization and such. But the best test is personal experimentation. Prompt engineering and subjective preference have a massive effect on finetune performance. |
|