Hacker News new | ask | show | jobs
by dw_arthur 127 days ago
Everyone should have their own private evals for models. If I ask a question and a model flat out gets it wrong sometimes I will put it in my test questions bank.