|
|
|
|
|
by ilaksh
1114 days ago
|
|
Incredibly, they seem to have used several different LLMs, yet made no distinction between the particular AI models used in the analysis. Amazing that they would not realize there is a huge difference in capabilities. They also did not seem to consider the different performance of individual prompts. |
|