We've had markov chain generators for a while, having enough computing power to grant them the power to regurgitate wikipedia reddit and stackoverflow content is not "a huge step towards agi"
It's true that Markov chain generators have existed for years. But historically their output was usually just this cute thing that gave you a chuckle; they were seldomly as useful in a general sense like LLMs currently are. I think that the increase you mention in compute power and data is itself a huge step forward.
But also transformers have been super important. Transformer-based LLMs are orders of magnitude more powerful, smarter, trained on more data, etc. than previous types of models because of how they can scale. The attention mechanism also allows them to pay attention to way more of the input, not just the few preceding tokens.
If you want something useful, then we're getting closer.
AGI is something specific, as a requisite, it must understand what is being asked, and what we have now is a puppet show that makes us humans think that the machine is thinking, similar to Markov chains.
There is absolutely some utility in this- but it's about as close to AGI as the horse-cart is to commercial aircraft.
Some AI hype people are really uncomfortable with that fact, I'm sorry, but that reality will hit you sooner rather than later.
It does not mean what we have is perfect, cannot be improved in the short term, or that it has no practical applications already.
EDIT: downvoting me wont change this, go study the field of academic AI properly please
AGI is something fairly specific, yes, but depending on what you mean by “understand”, I don’t think it necessarily needs to “understand”? To behave (for all practical purposes) as if it “understands” is good enough. For some senses of “understand” this may be the same thing as for it to “understand”, so for those senses of “understand”, yes it needs to “understand”.
It seems clear to me that, if we could programmatically sample from a satisfactory conditional probability distribution, that this would be sufficient for it to, for all practical purposes, behave as if it “understands”, and moreover for it to count as AGI. (For it to do so at a fast enough rate would make it both AGI and practically relevant.)
So, the question as I see it, is whether the developments with ANNs trained as they have been, is progress towards producing something that can sample from a conditional probability distribution in a way that would be satisfactory for AGI.
I don’t see much reason to conclude that they are not?
I suppose your claim is that the conditional probability distributions are not getting closer to being such that they are practically as if they exhibit understanding?
I guess this might be true…
It does seem like some things would be better served by having variables with a fixed identity but a changing value, rather than just producing more variables? I guess that’s kind of like the “pure functional programming vs not-that” distinction, and of course as pure functional programming shows, one can still compute whatever one wants while only using immutable values, but one still usually uses something that is as if a value is changing.
And of course, for transformer models, tasks that take more than O(N^2) or whatever (… maybe O(N^3) because on N tokens, each is processed in ways depending on each pair of the results of processing previous ones?) can’t be done in producing a single output token, so that’s a limitation there..
I suppose that the thing that is supposed to make transformers faster to train, by making it so that the predictions for each of the tokens in a sequence can be done in parallel, kinda only makes sense if you have a ground truth sequence of tokens… though there is the RLHF (and similar) where the fine-tuning is done based on estimation of a score on the final output… which I suppose possibly neither is great at getting behavior sufficiently similar to reasoning?
(Note: when I say “satisfactory probability distribution” I don’t mean to imply that we have a nice specification of a conditional probability distribution which we merely need to produce a method that can sample from it. But there should exist (in the abstract (non-constructive) mathematical sense) probability distributions which would be satisfactory.)
I do not consider "understanding", which cannot be quantified, as a feature of AGI.
In order for something to qualify as AGI, answering in a seemingly intelligent way is not enough. An AGI must be able to do the following things, which a competent human would do: given the task to accomplish something that nobody has done before, conceive a detailed plan how to achieve that, step by step. Then, after doing the first steps and discovering that they were much more difficult or much easier than expected, adjust the plan based on the accumulated experience, in order to increase the probability of reaching the target successfully.
Or else, one may realize that it is possible to reformulate the goal, replacing it with a related goal, which does not change much the usefulness of reaching the goal, but which can be reached by a modified plan with much better chances of success. Or else, recognize that at this time it will be impossible to reach the initial goal, but there is another simpler to reach goal that it is still desirable, even if it does not provide the full benefits of the initial goal. Then, establish a new plan of action, to reach the modified goal.
For now this kind of activity is completely outside the abilities of any AI. Despite the impressive progress demonstrated by LLMs, nothing done by them has brought a computer any closer of having intelligence in the sense described above.
It is true however, that there are a lot of human managers who would be equally clueless with an LLM, on how to perform such activities.
It's true that Markov chain generators have existed for years. But historically their output was usually just this cute thing that gave you a chuckle; they were seldomly as useful in a general sense like LLMs currently are. I think that the increase you mention in compute power and data is itself a huge step forward.
But also transformers have been super important. Transformer-based LLMs are orders of magnitude more powerful, smarter, trained on more data, etc. than previous types of models because of how they can scale. The attention mechanism also allows them to pay attention to way more of the input, not just the few preceding tokens.