Y
Hacker News
new
|
ask
|
show
|
jobs
by
dw_arthur
127 days ago
Everyone should have their own private evals for models. If I ask a question and a model flat out gets it wrong sometimes I will put it in my test questions bank.