|
|
|
|
|
by refulgentis
1032 days ago
|
|
There weren't any serious examples of degradation. Does only GPT-4 have to suffer a penalty for HumanEval leaking into training data/RLHF data? Ignoring those concerns, it fails a reaonable-ness smell test: We'd have to pretend its the original GPT-4 release from March 2023 until GPT-5 comes out, and only then can OpenAI's work be compared to LLAMA-2 to LLAMA-N. |
|
1. I'm not saying we have to wait until GPT-5, we just need an apples-to-apples comparison where contamination is taken into account
2. GPT-4 does not seem to have improved on real-world coding tasks since March, so it's unclear where any purported HumanEval gains could've come from
3. I've personally noticed degradation anecdotally in the GPT-4 June update vs. the original March release