| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by visarga 104 days ago

> Whenever I see claims about AGI being reachable through large language models, it reminds me of the miasma theory of disease.

Whenever I see people think the model architecture matters much, I think they have a magical view of AI. Progress comes from high quality data, the models are good as they are now. Of course you can still improve the models, but you get much more upside from data, or even better - from interactive environments. The path to AGI is not based on pure thinking, it's based on scaling interaction.

To remain in the same miasma theory of disease analogy, if you think architecture is the key, then look at how humans dealt with pandemics... Black Death in the 14th century killed half of Europe, and none could think of the germ theory of disease. Think about it - it was as desperate a situation as it gets, and none had the simple spark to keep hygiene.

The fact is we are also not smart from the brain alone, we are smart from our experience. Interaction and environment are the scaffolds of intelligence, not the model. For example 1B users do more for an AI company than a better model, they act like human in the loop curators of LLM work.

4 comments

awakeasleep 104 days ago

If I'm understanding you, it seems like you're struck by hindsight bias. No one knew the miasma theory was wrong... it could have been right! Only with hindsight can we say it was wrong. Seems like we're in the same situation with LLMs and AGI.

nradov 104 days ago

The miasma theory of disease was "not even wrong" in the sense that it was formulated before we even had the modern scientific method to define the criteria for a theory in the first place. And it was sort of accidentally correct in that some non-infectious diseases are caused by airborne toxins.

scarmig 104 days ago

Plenty of scientific authorities believed in it through the 19th century, and they didn't blindly believe it: it had good arguments for it, and intelligent people weighed the pros and cons of it and often ended up on the side of miasma over contagionism. William Farr was no idiot, and he had sophisticated statistical arguments for it. And, as evidence that it was a scientific theory, it was abandoned by its proponents once contagionism had more evidence on its side.

It's only with hindsight that we think contagionism is obviously correct.

psychoslave 100 days ago

> It's only with hindsight that we think contagionism is obviously correct.

We, the mere median citizen on any specific topic which is out of our expertise, certainly not. And this also have an impact as a social pressure in term of which theory is going to be given the more credits.

That's not actually specific to science. Even theological arguments can be dumb as hell or super refined by the smartest people able to thrive in their society of the time.

Correctness of the theories and how great a match they are with collected data is only a part of what make mass adoption of any theory, and not necessarily the most weighted. It's interdependence with feedback loops everywhere, so even the data collected, the tool used to collect and analyze and the metatheorical frameworks to evaluate different models are nothing like absolute objective givens.

0x3f 104 days ago

> Only with hindsight can we say it was wrong

It really depends what you mean by 'we'. Laymen? Maybe. But people said it was wrong at the time with perfectly good reasoning. It might not have been accessible to the average person, but that's hardly to say that only hindsight could reveal the correct answer.

ainch 104 days ago

It's unintuitive to me that architecture doesn't matter - deep learning models, for all their impressive capabilities, are still deficient compared to human learners as far as generalisation, online learning, representational simplicity and data efficiency are concerned.

Just because RNNs and Transformers both work with enormous datasets doesn't mean that architecture/algorithm is irrelevant, it just suggests that they share underlying primitives. But those primitives may not be the right ones for 'AGI'.

ordu 104 days ago

> Of course you can still improve the models, but you get much more upside from data, or even better - from interactive environments.

I'm on the contrary believe that the hunt for better data is an attempt to climb the local hill and be stuck there without reaching the global maximum. Interactive environments are good, they can help, but it is just one of possible ways to learn about causality. Is it the best way? I don't think so, it is the easier way: just throw money at the problem and eventually you'll get something that you'll claim to be the goal you chased all this time. And yes, it will have something in it you will be able to call "causal inference" in your marketing.

But current models are notoriously difficult to teach. They eat enormous amount of training data, a human needs much less. They eat enormous amount of energy to train, a human needs much less. It means that the very approach is deficient. It should be possible to do the same with the tiny fraction of data and money.

> The fact is we are also not smart from the brain alone, we are smart from our experience. Interaction and environment are the scaffolds of intelligence, not the model.

Well, I learned English almost all the way to B2 by reading books. I was too lazy to use a dictionary most of the time, so it was not interactive: I didn't interact even with dictionary, I was just reading books. How many books I've read to get to B2? ~10 or so. Well, I read a lot of English in Internet too, and watched some movies. But lets multiply 10 books by 10. Strictly speaking it was not B2, I was almost completely unable to produce English and my pronunciation was not just bad, it was worse. Even now I stumble sometimes on words I cannot pronounce. Like I know the words and I mentally constructed a sentence with it, but I cannot say it, because I don't know how. So to pass B2 I spent some time practicing speech, listening and writing. And learning some stupid topic like "travel" to have a vocabulary to talk about them in length.

How many books does LLM need to consume to get to B2 in a language unknown to it? How many audio records it needs to consume? Life wouldn't be enough for me to read and/or listen so much.

If there was a human who needed to consume as much information as LLM to learn, they would be the stupidest person in all the history of the humanity.

famouswaffles 103 days ago

>With only instructional materials (a 500-page reference grammar, a dictionary, and ≈400 extra parallel sentences) all provided in context, Gemini 1.5 Pro and Gemini 1.5 Flash are capable of learning to translate from English to Kalamang— a Papuan language with fewer than 200 speakers and therefore almost no online presence—with quality similar to a person who learned from the same materials

https://arxiv.org/abs/2403.05530

ordu 103 days ago

I'm not entirely sure, that I totally convinced, but yeah, it is better than me. I mean, I could do the same, but it would take me ages to go through 500 pages and to use them for the actual translation.

I'm not sure, because Gemini knows a lot of languages. The third language is easier to learn than the second one, I suppose 100th language is even easier? But still Gemini do better, than I believed.

bethekidyouwant 104 days ago

Are you asking how many books a large language model would need to read to learn a new language if it was only trained on a different language? probably just 1 (the dictionary)

suddenlybananas 103 days ago

Do you know anything about how languages work? A dictionary doesn't have sufficient information to speak a language.

bethekidyouwant 102 days ago

Actually I do know how latent space works, If you meant achieving excellence in syntax and grammar then much like us more examples are better

suddenlybananas 102 days ago

LLMs need vastly more examples than humans do, many orders of magnitude more.

0x3f 104 days ago

If model arch doesn't matter much how come transformers changed everything?

visarga 104 days ago

Luck. RNNs can do it just as good, Mamba, S4, etc - for a given budget of compute and data. The larger the model the less architecture makes a difference. It will learn in any of the 10,000 variations that have been tried, and come about 10-15% close to the best. What you need is a data loop, or a data source of exceptional quality and size, data has more leverage. Architecture games reflect more on efficiency, some method can be 10x more efficient than another.

0x3f 104 days ago

That's not how I read the transformer stuff around the time it was coming out: they had concrete hypotheses that made sense, not just random attempts at striking it lucky. In other words, they called their shots in advance.

I'm not aware that we have notably different data sources before or after transformers, so what confounding event are you suggesting transformers 'lucked' in to being contemporaneous with?

Also, why are we seeing diminishing returns if only the data matters. Are we running out of data?

jsnell 104 days ago

The premise is wrong, we are not seeing diminishing returns. By basically any metric that has a ratio scale, AI progress is accelerating, not slowing down.

0x3f 104 days ago

For example?

jsnell 104 days ago

For example:

The METR time-horizon benchmark shows steady exponential growth. The frontier lab revenue has been growing exponentially from basically the moment they had any revenues. (The latter has confounding factors. For example it doesn't just depend on the quality of the model but on the quality of the apps and products using the model. But the model quality is still the main component, the products seem to pop into existence the moment the necessary model capabilities exist.)