|
|
|
|
|
by andai
73 days ago
|
|
Is there a benchmark for these long tasks? That kind of seems like the only number worth measuring. (Of course at that point it involves memory and context management and so on, so you're testing the harness as well as the model.) |
|