|
|
|
|
|
by imtringued
702 days ago
|
|
LLMs are reaching saturation on even some of the latest benchmarks and yet I am still a little disappointed by how they perform in practice. They are by no means bad, but I am now mostly interested in long context competency. We need benchmarks that force the LLM to complete multiple tasks simultaneously in one super long session. |
|