Hacker News new | ask | show | jobs
by godelski 697 days ago
> This is progress.

I want to stress that not only do I agree with this, I explicitly stated so. I even explicitly said it would be naive to dismiss this progress AND explicitly criticized Gary for doing so. I want to make it abundantly clear that I am not claiming LLMs are not progress. I feel I have to do this because the context here and because it is common to conflate critiques of LLMs with dismissal of LLMs.

> Ten years ago it would have been a fantasy to consider pictures and animations as intermediate representations in an AI system.

I'm hesitant to agree. I'll agree if we are also saying that 10 years ago it would be considered fantasy to build a lossy compression of human written knowledge, build a natural language interface into it, and have this all under 200GB. In that I think someone could imagine a system but think it is far away and maybe even not believe the last condition. But this is a reasonably accurate description of LLMs (what's up for debate is the reasoning capacity, not the compression aspect).

And my point is not about technical capabilities. Like every other ML researcher, the release of GPT made me believe AGI was much closer than I had previously thought. But similar to many ML researchers, I later again reevaluated returning back to a similar position as I transitioned from seeing examples to have intimate experience with usage, deeply diving into the data they are trained on, into the training processes, and probing these machines.

For some people understanding the mechanics behind a "magic trick" makes the magic trick unimpressive. But I've always been fascinated and the mechanics often makes the tricks far more impressive! What GPT made everyone reconsider is how much we could do with data alone. How powerful and impressive our existing statistical frameworks are when scaled. But there is no evidence here that these systems actually understand what they are processing. There is no evidence that these systems are logically reasoning and there is a fair amount of evidence that they are not[0]. The details here matter and are the critical part of answering these questions. Because, as you mentioned, we've made a lot of progress. And the thing is that when we progress, the amount of complexity needed to further advance also increases. A low order approximation takes you a long way but we know complexity increases quickly to increase accuracy slowly.

I guess I would be more willing to believe a path to AGI argument with the systems if they were more robust (not to say I am not still impressed). I think even if the systems could perform the image generation tasks I described here[1] (see Imgur link), I do not believe this is enough to demonstrate intelligence or reasoning, alone. The types of errors made are not illustrative of a system that understands but it also is important to remember that proof is not symmetric (note: image generation is my specific research area). A billion positive examples do not constitute a proof while a single counter example constitutes a counter proof. My concern is that these discussions are often in the form of demonstration as proof. Demonstrations aren't proof and no amount of them will constitute proof. But it's also important to note that a counter proof is not always an _absolute_ rejection but often is more often bounding. What I'm saying is that the counter examples don't dismiss the utilities of LLMs but they do place strong bounds on where the utility lives. The distinction does matter, and a disregard of this distinction is specifically what I am criticizing Gary for.

[0] https://news.ycombinator.com/item?id=41097025

I want to also mention that the word "reasoning" is not always constant and that this unfortunately makes the discussion more convoluted. But I think we need to understand what is in the training data to accurately understand how to accurately test these abilities. Similarly the terms "out of distribution" and "zero/low shot" are changing and often not in great ways. E.g. it is common to train on LAION and "zero-shot" on ImageNet.

[1] https://news.ycombinator.com/item?id=41063312