|
|
|
|
|
by doe_eyes
724 days ago
|
|
Yes, this seemed pretty striking to me: the author clearly wanted the LLM to perform well. They started with a problem for which solutions are pretty much readily available on the internet, and then provided a pretty favorable take on the model's mistakes. But the bottom line is that it's a task that a novice could have solved with a Google search or two, and the LLM fumbled it in ways that'd be difficult for a non-expert to spot and rectify. LLMs are generally pretty good at information retrieval, so it's quite disappointing. The cookie thing... well, they learn statistical patterns. People on the internet often try harder if there is a quid-pro-quo, so the LLMs copy that, and it slips past RLHF because "performs as well with or without a cookie" is probably not one of the things they optimize for. |
|