|
|
|
|
|
by pydry
349 days ago
|
|
They created an environment to expose LLMs to problems and test their performance which were immune from benchmark hacking using puzzles. Your comment was about how this was unreasonably hard (for coding challenges). Anecdotally Ive seen LLMs do all sorts of amazing shit which was obviously drawn from their training set and fall flat on their faces doing simple coding tasks which are novel enough to not appear in the training set. |
|
I don't think it has much relevance at all to a conversational about how good LLMs are at solving programming problems by running tools in a loop.
I keep seeing this idea that LLMs can't handle problems that aren't in their training data and it's frustrating because anyone who has spent significant time working with these systems knows that it obviously isn't true.