| > Weird thing is it was designed to model language. It’s surprising that it returns sound answers as often as it does. Is this surprising? Can you point to researchers in the field being “surprised” by LLMs returning sound answers? > “surprising”, i.e. we don’t really know what happened. This ie reads like a sort of popsci conclusion. We know exactly what happened. We programmed it to perform these calculations. It’s actually rather straightforward elementary mathematics. But, what happens is so many interdependent calculations grow the complexity of the problem until we are unable to hold it in it our minds, and to analyze its decisions computationally necessitates similar levels of computation for each decision being made as what was used to compute the weights. As for its effectiveness, familiarity with the field of computational complexity points to high dimensional polynomial optimization problems being broadly universal solvers. |
It's surprising because it wasn't the intent of LLMs. LLMs are just predictive models that guess the most likely next word. Having the results make sense was never a priority. Early version, GPT1/2, all return mostly complete nonsense. It was only with GPT3 when the model got large enough that it started returning results that are convincing and might even make sense often enough.
Even more mind boggling is the fact that randomness is part of its algorithm, i.e. temperature, and that without it the output is kind of meh.