|
|
|
|
|
by Veedrac
679 days ago
|
|
IMO this is the least convincing part of the benchmark though, since it's uninterpretable without an optimal baseline. You don't know how much of this is because of Spice and how much is because of how the task scales. (This is acknowledged as future work.) |
|