Hacker News new | ask | show | jobs
by he0001 1158 days ago
Will we be ever able to determine if LLM had peaked or not, or that it’s getting better or worse? Is there a way to tell? I mean throwing random sentences at it and try to determine that it responded right to it can’t be the way forward? And for what applications can it be trusted to do as if it just suddenly just decides to answer incredibly wrong?