|
|
|
|
|
by zahlman
139 days ago
|
|
> I gave 7 frontier LLMs a simple task: pilot a drone through a 3D voxel world and find 3 creatures. > Only one could do it. If I understood the chart correctly, even the successful one only found 1/6 of the creatures across multiple runs. |
|
Without comparison to some null hypothesis (a random policy), this article is hogwash.