|
|
|
|
|
by llm_trw
593 days ago
|
|
Yes, they also fail. I've found the original gpt4 to be the most consistent. One of these days I'll spend the couple of thousands needed to benchmark all the top models and see how they actually perform on a task which can't be gamed. |
|
I found that they are good at logic and math problems but still hallucinate. I didn’t try to stretch test them with hard problems though.