Hacker News new | ask | show | jobs
by redman25 531 days ago
What is the metric for LLMs? Shouldn't more than just accuracy be measured? If something has high accuracy but low recall, won't it be overfit and fail to generalize? Your metrics would give you false confidence in how effective your model is. Just wondering because the announcement only seems to mention accuracy.
1 comments

Good point, we should provide more detailed metrics. Since we are very early, we focus on the main metric in our view: higher accuracy of changes to be more practically usable. We will do more testing on overfitting and how the model performance on different types of tasks. On high level we believe in the idea of "a well fine-tuned model should be much better than a large general model". But we need more metrics, I agree.