|
|
|
|
|
by bguberfain
374 days ago
|
|
So they used a LLM with knowledge cut in mid 2023 to evaluate 2023? Seems like a classic leakage problem. From paper: "testing set: January 1, 2023, to December 31, 2023" From the Llama 2 doc: "(...) some tuning data is more recent, up to July 2023." |
|