Hacker News new | ask | show | jobs
by sp332 792 days ago
Why is the 3B model worse than the 450M model on MMLU and TruthfulQA?