|
|
|
|
|
by mlepath
528 days ago
|
|
Yea, people have a really hard time dealing with data leakage especially on data sets as large as LLMs need. Basically if something appeared online or was transmitted over the wire should no longer be eligible to evaluate on. D. Sculley had a great talk at NeurIPS 2024 (same conference this paper was in) titled Empirical Rigor at Scale – or, How Not to Fool Yourself Basically no one knows how to properly evaluate LLMs. |
|