|
|
|
|
|
by chewxy
983 days ago
|
|
I note something very interesting in the AI hype, and I would like someone to help explain it. Whenever there's a news or article noting the limits of current LLM tech (especially the GPT class of models from OpenAI), there's always a comment that says something along the lines of "ah did you test it on GPT-4"? Or if it's clear that it's the limitation of GPT-4, then you have comments along the lines of "what's the prompt?", or "the prompt is poor". Usually, it's someone who hasn't in the past indicated that they understand that prompt engineering is model specific, and the papers' point is to make a more general claim as opposed to a claim on one model. Can anyone explain this? It's like the mere mention of LLMs being limited in X, Y, Z fashion offends their lifestyle/core beliefs. Or perhaps it's a weird form of astroturfing. To which, I ask, to what end? |
|
Perhaps because whenever there's "a news or article noting the limits of current LLM tech", it's a bit like someone tried to play a modern game on a machine they found in their parents' basement, and the only appropriate response to this is, "have you tried running it on something other than a potato"? This has been happening so often over the past few months that it's the first red flag you check for.
GPT-4 is still qualitatively ahead of all other LLMs, so outside of articles addressing specialized aspects of different model families, the claims are invalid unless they were tested on GPT-4.
(Half the time the problem is that the author used ChatGPT web app and did not even realize there are two models and they've been using the toy one.)