Hacker News new | ask | show | jobs
by gcr 821 days ago
with respect, we don’t know if he was spot on. Companies shoehorning language models into their products is a far cry from the transformative societal change he describes will happen. nothing like a singularity has yet happened at the scale he describes, and might not happen without more fundamental shifts/breakthroughs in AI research.
7 comments

What we're seeing right now with LLMs is like music in the late 30s after the invention of the electric guitar. At that point people still have no idea how to use it so, so they were treating it like an amplified acoustic guitar. It took almost 40 years for people to come up with the idea of harnessing feedback and distortion to use the guitar to create otherworldly soundscapes, and another 30 beyond that before people even approached the limit of guitar's range with pedals and such.

LLMs are a game changer that are going to enable a new programming paradigm as models get faster and better at producing structured output. There are entire classes of app that couldn't exist before because there there was a non-trivial "fuzzy" language problem in the loop. Furthermore I don't think people have a conception of how good these models are going to get within 5-10 years.

> Furthermore I don't think people have a conception of how good these models are going to get within 5-10 years.

Pretty sure it's quite the opposite of what you're implying: People see those LLMs who closely resemble actual intelligence on the surface, but have some shortcomings. Now they extrapolate this and think it's just a small step to perfection and/or AGI, which is completely wrong.

One problem is that converging to an ideal is obviously non-linear, so getting the first 90% right is relatively easy, and closer to 100% it gets exponentially harder. Another problem is that LLMs are not really designed in a way to contain actual intelligence in the way humans would expect them to, so any apparent reasoning is very superficial as it's just language-based and statistical.

In a similar spirit, science fiction stories playing in the near future often tend to have spectacular technology, like flying personal cars, in-eye displays, beam travel, or mind reading devices. In the 1960s it was predicted for the 80s, in the 80s it was predicted for the 2000s etc.

This book

https://www.amazon.com/Friends-High-Places-W-Livingston/dp/0...

tells (among other things) a harrowing tale of a common mistake in technology development that blindsides people every time: the project that reaches an asymptote instead of completion that can get you to keep spending resources and spending resources because you think you have only 5% to go except the approach you've chosen means you'll never get the last 4%. It's a seductive situation that tends to turn the team away from Cassandras who have a clear view.

Happens a lot in machine learning projects where you don’t have the right features. (Right now I am chewing on the problem of “what kind of shoes is the person in this picture wearing?” and how many image classification models would not at all get that they are supposed to look at a small part of the image and how easy it would be to conclude that “this person is on a basketball court so they are wearing sneakers” or “this is a dude so they aren’t wearing heels” or “this lady has a fancy updo and fancy makeup so she must be wearing fancy shoes”. Trouble is all those biases make the model perform better up to a point but to get past that point you really need to segment out the person’s feet.)

You are looking at things like the failure of full self driving due to massive long tail complexity, and extrapolating that to LLMs. The difference is that full self driving isn't viable unless it's near perfect, whereas LLMs and text to image models are very useful even when imperfect. In any field there is a sigmoidal progress curve where things seem to move slowly at first when getting set up, accelerate quickly once a framework is in place, then start to run out of low hanging fruit and have to start working hard for incremental progress, until the field is basically mined out. Given the rate that we're seeing new stuff come out related to LLMs and image/video models, I think it's safe to say we're still in the low hanging fruit stage. We might not achieve better than human performance or AGI across a variety of fields right away, but we'll build a lot of very powerful tools that will accelerate our technological progress in the near term, and those goals are closer than many would like to admit.
AGI (human level intelligence) is not an really an end goal but a point that will be surpassed. So, by looking at it as something asymptotically approaching an ideal 100% is fundamentally wrong. That 100% mark is going to be in the rear view mirror at some point. And it's a bit of an arbitrary mark as well.

Of course it doesn't help that people are a bit hand wavy about what that mark exactly is to begin with. We're very good at moving the goal posts. So that 100% mark has the problem that it's poorly defined and in any case just a brief moment in time given exponential improvements in capabilities. In the eyes of most we're not quite there yet for whatever there is. I would agree with that.

At some point we'll be debating whether we are actually there, and then things move on from there. A lot of that debate is going to be a bit emotional and irrational of course. People are very sensitive about these things and they get a bit defensive when you portray them as clearly inferior to something else. Arguably, most people I deal with don't actually know a lot, their reasoning is primitive/irrational, and if you'd benchmark them against an LLM it wouldn't be that great. Or that fair.

The singularity is kind of the point where most of the improvements to AI are going to come from ideas and suggestions generated by AI rather than by humans. Whether that's this decade or the next is a bit hard to predict obviously.

Human brains are quite complicated but there's only a finite number of neurons in there; a bit under 100 billion. We can waffle a bit about the complexity of their connections. But at some point it becomes a simple matter of throwing more hardware at the problem. With LLMs pushing tens-hundreds of parameters already, you could legitimately ask what a few more doublings in numbers here enable.

I think you're falling for the exact same fallacy that I was describing. Also note that the human level of intelligence is not arbitrary at all: Most LLMs are trained on human-generated data, and since they are statistical models, they won't suddenly come up with truly novel reasoning. They're generally just faster at generating stuff than humans, because they're computers.
>But at some point it becomes a simple matter of throwing more hardware at the problem.

Insofar as it's simple to throw like six orders of magnitude more hardware at something that has already had a lot of hardware thrown at it.

In 5 to 10 years we will have likely moved on to the next big model architecture just like it was all about convolutional networks 5 to 10 years ago despite the pivotal paper being published in 2017.
Singularity doesn't necessarily rely on LLMs by any means. It's just that communication is improving and the number of people doing research is increasing. Weak AI is icing on top, let alone LLMs, which are being shoe-horned into everything now. VV clearly adds these two other paths:

            o Computer/human interfaces may become so intimate that users
              may reasonably be considered superhumanly intelligent.
            o Biological science may find ways to improve upon the natural
              human intellect.
https://edoras.sdsu.edu/~vinge/misc/singularity.html
Yeah this is the angle I look at the most, the Humans+Internet combo.

I don't believe LLMs will really get us much of anywhere, Singularity-wise. They're just ridiculously inefficient in terms of compute (and thus power) needs to even do the basic pattern-prediction they do today. They're neat tools for human augmentation in some cases, but that's about all they contribute.

I think, even prior to the recent explosion of LLM stuff, that the aggregate of Humans and the depth of their interconnections on the Internet is already starting to form at least the beginnings of a sort of Singularity, without any AI-related topics needing to be introduced. The way memes (real memes, not silly jokes) spread around the Internet and shape thoughts across all the users, the way the users bounce ideas off each other and refine them, the way viral advocacy and information sharing works, etc. Basically the Singularity is just going to be the emergent group consciousness and capabilities of the collective Internet-connected set of Humans.

> Within thirty years, we will have the technological means to create superhuman intelligence.

Blackwell.

> o Develop human/computer symbiosis in art: Combine the graphic generation capability of modern machines and the esthetic sensibility of humans. Of course, there has been an enormous amount of research in designing computer aids for artists, as labor saving tools. I'm suggesting that we explicitly aim for a greater merging of competence, that we explicitly recognize the cooperative approach that is possible. Karl Sims [22] has done wonderful work in this direction.

Stable Diffusion.

> o Develop interfaces that allow computer and network access without requiring the human to be tied to one spot, sitting in front of a computer. (This is an aspect of IA that fits so well with known economic advantages that lots of effort is already being spent on it.)

iPhone and Android.

> o Develop more symmetrical decision support systems. A popular research/product area in recent years has been decision support systems. This is a form of IA, but may be too focussed on systems that are oracular. As much as the program giving the user information, there must be the idea of the user giving the program guidance.

Cicero.

> Another symptom of progress toward the Singularity: ideas themselves should spread ever faster, and even the most radical will quickly become commonplace.

Trump.

> o Use local area nets to make human teams that really work (ie, are more effective than their component members). This is generally the area of "groupware", already a very popular commercial pursuit. The change in viewpoint here would be to regard the group activity as a combination organism. In one sense, this suggestion might be regarded as the goal of inventing a "Rules of Order" for such combination operations. For instance, group focus might be more easily maintained than in classical meetings. Expertise of individual human members could be isolated from ego issues such that the contribution of different members is focussed on the team project. And of course shared data bases could be used much more conveniently than in conventional committee operations. (Note that this suggestion is aimed at team operations rather than political meetings. In a political setting, the automation described above would simply enforce the power of the persons making the rules!)

Ingress.

> o Exploit the worldwide Internet as a combination human/machine tool. Of all the items on the list, progress in this is proceeding the fastest and may run us into the Singularity before anything else. The power and influence of even the present-day Internet is vastly underestimated. For instance, I think our contemporary computer systems would break under the weight of their own complexity if it weren't for the edge that the USENET "group mind" gives the system administration and support people!) The very anarchy of the worldwide net development is evidence of its potential. As connectivity and bandwidth and archive size and computer speed all increase, we are seeing something like Lynn Margulis' [14] vision of the biosphere as data processor recapitulated, but at a million times greater speed and with millions of humanly intelligent agents (ourselves).

Twitter.

> o Limb prosthetics is a topic of direct commercial applicability. Nerve to silicon transducers can be made [13]. This is an exciting, near-term step toward direct communcation.

Atom Limbs.

> o Similar direct links into brains may be feasible, if the bit rate is low: given human learning flexibility, the actual brain neuron targets might not have to be precisely selected. Even 100 bits per second would be of great use to stroke victims who would otherwise be confined to menu-driven interfaces.

Neuralink.

---

https://justine.lol/dox/singularity.txt

>> > Within thirty years, we will have the technological means to create superhuman intelligence.

> Blackwell.

I'm fucking sorry but there is no LLM or "AI" platform that is even real intelligence, today, easily demonstrated by the fact that an LLM cannot be used to create a better LLM. Go on, ask ChatGPT to output a novel model that performs better than any other model. Oh, it doesn't work? That's because IT'S NOT INTELLIGENT. And it's DEFINITELY not "superhuman intelligence." Not even close.

Sometimes accurately regurgitating facts is NOT intelligence. God it's so depressing to see commenters on this hell-site listing current-day tech as ANYTHING approaching AGI.

> Oh, it doesn't work? That's because IT'S NOT INTELLIGENT.

Ok, let's run this test of "real intelligence" on you. We eagerly await to see your model. Should be a piece of cake.

> an LLM cannot be used to create a better LLM

By that logic most humans are also not intelligent.

You didn't read him correctly; he's not saying Blackwell is AGI. I believe that he's saying that perhaps Blackwell could be computationally sufficient for AGI if "used correctly."

I don't know where that "computationally sufficient" line is. It'll always be fuzzy (because you could have a very slow, but smart entity). And before we have a working AGI, thinking about how much computation we need always comes down to back of the envelope estimations with radically different assumptions of how much computational work brains do.

But I can't rule out the idea that current architectures have enough processing to do it.

I don't use the A word, because it's one of those words that popular culture has poisoned with fear, anger, and magical thinking. I can at least respect Kurzweil though and he says the human brain has 10 petaflops. Blackwell has 20 petaflops. That would seem to make it capable of superhuman intelligence to me. Especially if we consider that it can focus purely on thinking and doesn't have to regulate a body. Imagine having your own video card that does ChatGPT but 40x smarter.
I think there's a big focus on petaflops and that it may have been a good measure to think about initially, but now we're missing the mark.

If a human brain does its magic with 10 petaflops, and you have 1 petaflop, you should be able to make an equivalent to the human brain that runs at 1/10th of the speed but never sleeps. In other words, once you've reached the same order of magnitude it doesn't matter.

On the other hand, Kurzweil's math really comes down to an argument that the brain is using about 10 petaflops for inference, but it also is changing weights and doing a lot more math and optimization for training (which we don't completely understand). It may (or may not) take considerably more than 10 petaflops to train at the rate humans learn. And remember, humans take years to do anything useful.

Further, 10 petaflops may be enough math, but it doesn't mean you can store enough information or flow enough state between the different parts "of the model."

These are the big questions. If we knew the answers, IMO, we would already have really slow AGI.

Yes I agree there's a lot of interesting problems to solve and things to learn when it comes to modeling intelligence. Vernor Vinge was smart in choosing the wording that we'd have the means to create superhuman intelligence by now, since no one's ever going to agree if we've actually achieved it.
Probably just a question of time constant / zoom on your time axis. When zoomed in up close, an exponential looks a lot like a bunch of piecewise linear components, where big breakthroughs just are a discontinuous changes in slope...
Still has 6 years to be proven correct.
Imagine the first llm to suggest an improvement to itself that no human has considered. Then imagine what happens next.
OK. I'm imagining a correlation engine that looks through code as a series of prompts that are used to generate more code from the corpus that is statistically likely to follow.

And now I'm transforming that through the concept of taking a photograph and applying the clone tool via a light airbrush.

Repeat enough times, and you get uncompilable mud.

LLMs are not going to generate improvements.

Saying they definitely won't or they definitely will are equally over-broad and premature.

I currently expect we'll need another architectural breakthrough; but also, back in 2009 I expected no-steering-wheel-included self driving cars no later than 2018, and that the LLM output we actually saw in 2023 would be the final problem to be solved in the path to AGI.

Prediction is hard, especially about the future.

GPT4 does inference at 560 teraflops. Human brain goes 10,000 teraflops. NVIDIA just unveiled their latest Blackwell chip yesterday which goes 20,000 teraflops. If you buy an NVL72 rack of the things, it goes 1,400,000 teraflops. That's what Jensen Huang's GPT runs on I bet.
> GPT4 does inference at 560 teraflops. Human brain goes 10,000 teraflops

AFAICT, both are guesses. The low-end estimate I've seen for human brains are ~ 162 GFLOPS[0] to 10^28 FLOPS[1]; even just the model size for GPT-4 isn't confirmed, merely a combination of human inference of public information with a rumour widely described as a "leak", likewise the compute requirements.

[0] https://geohot.github.io//blog/jekyll/update/2022/02/17/brai...

[1] https://aiimpacts.org/brain-performance-in-flops/

They're not guesses. We know they use A100s and we know how fast an A100 goes. You can cut a brain open and see how many neurons it has and how often they fire. Kurzweil's 10 petaflops for the brain (100e9 neurons * 1000 connections * 200 calculations) is a bit high for me honestly. I don't think connections count as flops. If a neuron only fires 5-50 times a second then that'd put the human brain at .5 to 5 teraflops it seems to me. That would explain why GPT is so much smarter and faster than people. The other estimates like 1e28 are measuring different things.
They might generate improvements, but I’m not sure why people think those improvements would be unbounded. Think of it like improvements to jet engines or internal combustion engines - rapid improvements followed by decades of very tiny improvements. We’ve gone from 32-bit LLM weights down to 16, then 8, then 4 bit weights, and then a lot of messy diminishing returns below that. Moore’s is running on fumes for process improvements, so each new generation of chips that’s twice as fast manages to get there by nearly doubling the silicon area and nearly doubling the power consumption. There’s a lot of active research into pruning models down now, but mostly better models == bigger models, which is also hitting all kinds of practical limits. Really good engineering might get to the same endpoint a little faster than mediocre engineering, but they’ll both probably wind up at the same point eventually. A super smart LLM isn’t going to make sub-atomic transistors, or sub-bit weights, or eliminate power and cooling constraints, or eliminate any of the dozen other things that eventually limit you.
Saying that AI hardware is near a dead end because Moore's law is running out of steam is silly. Even GPUs are very general purpose, we can make a lot of progress in the hardware space via extreme specialization, approximate computing and analog computing.
I'm mostly saying that unless a chip-designing AI model is an actual magical wizard, it's not going to have a lot of advantage over teams of even mediocre human engineers. All of the stuff you're talking about is Moore's Law limited after 1-2 generations of wacky architectural improvements.
Bro, Jensen Huang just unveiled a chip yesterday that goes 20 petaflops. Intel's latest raptorlake cpu goes 800 gigaflops. Can you really explain 25000x progress by the 2x larger die size? I'm sure reactionary America wanted Moore's law to run out of steam but the Taiwanese betrayal made up for all the lost Moore's law progress and then some.
That speedup compared to Nvidia's previous generation came nearly entirely from: 1) a small process technology improvement from TSMC, 2) more silicon area, 3) more power consumption, and 4) moving to FP4 from FP8 (halving the precision). They aren't delivering the 'free lunch' between generations that we had for decades in terms of "the same operations faster and using less power." They're delivering increasingly exotic chips for increasingly crazy amounts of money.
Pro tip: If you want to know who is the king of AI chips, compare FLOPS (or TOPS) per chip area, not FLOPS/chip.

As long as the bottleneck is the fab capacity as wafers per hous, the number of operations per second per chip area determines who will produce more compute with best price. It's a good measure even between different technology nodes and superchips.

Nvidia is leader for a reason.

If manufacturing capacity increases to match the demand in the future, FLOPS or TOPS per Watt may become relevant, but now it's fab capacity.

Taiwanese betrayal? I’m not sure I understand the reference.
There's no reference. It's just a bad joke. What they did was actually very good.
LLMs are so much more than you are assuming… text, images, code are merely abstractions to represent reality. Accurate prediction requires no less than usefully generalizable models and deep understanding of the actual processes in the world that produced those representations.

I know they can provide creative new solutions to totally novel problems from firsthand experience… instead of assuming what they should be able to do, I experimented to see what they can actually do.

Focusing on the simple mechanics of training and prediction is to miss the forest for the trees. It’s as absurd as saying how can living things have any intelligence? They’re just bags of chemicals oxidizing carbon. True but irrelevant- it misses the deeper fact that solving almost any problem deeply requires understanding and modeling all of the connected problems, and so on, until you’ve pretty much encompassed everything.

Ultimately it doesn’t even matter what problem you’re training for- all predictive systems will converge on general intelligence as you keep improving predictive accuracy.

LLM != AI.

An LLM is not going to suggest a reasonable improvement to itself, except by sheerest luck.

But then next generation, where the LLM is just the language comprehension and generation model that feeds into something else yet to be invented, I have no guarantees about whether that will be able to improve itself. Depends on what it is.

Yes, eventually one gets a series of software improvements which eventually result in the best possible performance on currently available hardware --- if one can consistently get an LLM to suggest improvements to itself.

Until we get to a point where an AI has the wherewithal to create a fab to make its own chips and then do assembly w/o human intervention (something along the lines of Steve Jobs vision of a computer factory where sand goes in at one end and finished product rolls out the other) it doesn't seem likely to amount to much.

That may happen more easily than you're suggesting. LLMs are masters at generating plausible sounding ideas with no regard to their factual underpinnings. So some of those computational bong hits might come up with dozens of plausible looking suggestions (maybe featuring made up literature references as well).

It would be left to human researchers to investigate them and find out if any work. If they succeed, the LLM will get all the credit for the idea, if they fail, it's them who will have wasted their time.

It has, anyway, already had a profound effect on the IT job market.