|
|
|
|
|
by spmurrayzzz
968 days ago
|
|
AFAIK they haven't released the dataset they fine-tuned on, so we can't be 100% there wasn't benchmark contamination. Agree that we definitely need more than N=1 to challenge the performance claims, but I still think its valid to call it out given how much benchmarking-gaming we've seen in this space. |
|