| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by airgapstopgap 1158 days ago

We won't hit the wall.

Somewhat counterintuitively, scaling datasets is the lazy and economical approach. If you have the compute already, might as well dig an OOM more text tokens.

But there are other sources of data, and slightly different ways to utilize it. Multimodality, in very large training runs, will almost inevitably increase sample efficiency (for obvious reasons of context richness), synthetic data is already very effective [1], and there are and will be discovered other ways to do more in the condition of diminishing raw text resources. But a thorough abandonment of the scaling strategy is very unlikely.

Sutton's Bitter Lesson [2] points at a very powerful rule of thumb: we shouldn't turn AI engineering into a contest of smartness, we should allow complex smartness to emerge from generic low-level algorithms. What will be seen as laughable in decades to come is not the scaling strategy, but the Godlike conceit of people who thought they can devise generally applicable rules of reasoning from first principles.

1: https://arxiv.org/abs/2304.08466 2: http://www.incompleteideas.net/IncIdeas/BitterLesson.html

3 comments

sanxiyn 1158 days ago

I don't think your "synthetic data on ImageNet" reference shows "synthetic data is already very effective". Since many people won't read the paper, here's what it says:

Training ResNet-50 on real ImageNet gives 73.09% top-1 accuracy, while training it on synthetic data (same resolution, same number of images) generated by this work gives 64.96%, which is SOTA compared to previous work's 63.02%. Therefore, synthetic data is worse than real data for now.

But synthetic data is not useless, because training on real data plus synthetic data is a bit better than both real data and synthetic data. (Accuracy here is different due to different methodology.) Using 1:1 real data and synthetic data improves accuracy from 76.39% to 77.61%. But using 1:2 is worse than 1:1 (77.16%), even if dataset became 50% larger. With 1:4, result is worse than not using synthetic data at all. So synthetic data at best can enlarge dataset by 5x, more likely just 2x.

PoignardAzur 1157 days ago

I wonder how much you can improve that scaling factor by using data augmentation techniques (noise, rescaling, recropping, rotation, changing colors, using normal maps, etc).

nologic01 1158 days ago

You are masquerading personal preferences (and possibly professional interests) as rules of nature. If anything, Godlike conceit definetely applies to some ML accolytes.

In any case, with your last point "we should allow complex smartness to emerge" you essentially agree with my point that new levels will emerge from orthogonal (new) directions.

The good thing about brute force is that it summons so many resources it primes the way for smarter approaches.

For those not conceited the objective is not some deus-ex-machina but "algorithms that work".

airgapstopgap 1154 days ago

It is interesting that you don't even hide having strong personal emotional preference at stake. Now, does this not suggest that your predictions are a priori less credible, by your logic?

No, I don't think "orthogonal" directions will be fruitful.

I also disagree on evaluations. What you call brute search is not brute search at all, nor a deux ex machina, it is a lawful and honest method of algorithmic discovery of true regularities. "Smarter approaches", meanwhile, usually amount to stilted expressions of narcissism of researchers overly proud with having come up with shallow tricks aping some aspect of explicit human reasoning. They're not actually smart, nor do they work far outside of the toy distribution for which they were developed.

auggierose 1157 days ago

But we can devise generally applicable rules of reasoning from first principles. It's called logic. I am pretty sure the next step is to properly combine machine learning and logic properly.

eru 1157 days ago

Seems unlikely, that never worked in the past. And humans don't actually use logic (especially formal logic) to come up with anything. They just use it to justify what they came up with.

Not even mathematicians think in terms of logic when trying to solve problems.

auggierose 1157 days ago

Of course mathematicians also think in terms of logic. It’s what you learn when you study mathematics, you soak it up automatically, although few study logic explicitly. And before 2015 a machine beating worlds best go player also seemed pretty unlikely.

eru 1156 days ago

I've studied mathematics.

You only do (formal) logic as an afterthought when communicating your proofs to other people or writing them down. Otherwise it's mostly intuition.

auggierose 1156 days ago

I've studied mathematics, too. Yes, formal logic is an afterthought when you do mathematics. But formal logic is just an explicit representation of what goes on internally in a mathematician. Or at least that's how I approach formal logic (most logicians don't). I would describe these internal processes inside a mathematician (and outside, when used for communication) as intuition + "logic to keep intuition in check". Sounds like ML + logic to me.

subjectsigma 1157 days ago

There are already tons of systems (for example Google Translate) that combine rule-based reasoning with probabilistic reasoning. Looks to be working to me.

eru 1157 days ago

Interesting. Do you have any sources on Google Translate using rule-based reasoning?

subjectsigma 1157 days ago

Machine Translation, by Thierry Poibeau, 2017.

eru 1156 days ago

Alas, that was around the time Google Translate switched to Neural Networks:

See https://blog.google/products/translate/found-translation-mor... and https://en.wikipedia.org/wiki/Google_Neural_Machine_Translat...

It doesn't look like they are still using any rule-based reasoning?

The blog post says:

> With this update, Google Translate is improving more in a single leap than we’ve seen in the last ten years combined. [...]

Which seems pretty strong evidence to me that moving away from rule-based reasoning or even a hybrid approach that includes rule-based reasoning, was a clear win?

blatant303 1157 days ago

First principles don't work in the space of systems geared towards extreme generalization such as LLMs. You need to be ready to compare anything with anything and build bridges between many principles. In fact there is a deep link between the progress of structuralism in mathematics culminating with homotopy type theory and its parallel (r)evolution in the humanities with the discovery of manuscripts by the founder of structural linguistics, Ferdinand de Saussure.

Identity is what provides the irreducible basis, in the sense that we cannot enter into the consideration of specific facts that are placed under this identity, and it is this identity that becomes for us the true concrete fact, beyond which there is nothing more.

...

For example, for a musical composition, compared to a painting. Where does a musical composition exist? It is the same question as to know where 'aka' exists. In reality, this composition only exists when it is performed; but to consider this performance as its existence is false. Its existence is the identity of the performances.

...

For each of the things we have considered as a truth, we have arrived through so many different paths that we confess we do not know which one should be preferred. To properly present the entirety of our propositions, it would be necessary to adopt a fixed and defined starting point. But what we are trying to establish is that it is false to admit in linguistics a single fact as defined in itself. There is, therefore, a necessary absence of any starting point, and if some reader is willing to follow our thoughts carefully from one end to the other of this volume, they will recognize, we are convinced, that it was, so to speak, impossible to follow a very rigorous order. We will allow ourselves to present, up to three or four times in different forms, the same idea to the reader because there really is no starting point more appropriate than another on which to base the demonstration.

...

As language offers no substance under any of its manifestations, but only combined or isolated actions of physiological, physical, and mental forces, and as nevertheless all our distinctions, our terminology, and all our ways of speaking are based on this involuntary assumption of a substance, we cannot refuse, first and foremost, to recognize that the most essential task of the theory of language will be to untangle what our primary distinctions are all about.

...

There are different types of identity. This is what creates different orders of linguistic facts. Outside of any identity relationship, a linguistic fact does not exist. However, the identity relationship depends on a variable point of view that one decides to adopt; therefore, there is no rudiment of a linguistic fact outside the defined point of view that presides over distinctions.

Source: http://www.revue-texto.net/docannexe/file/116/saussure255_6....

TL;DR: identity is equivalent to equivalence

auggierose 1156 days ago

There is no reason why logic cannot follow various different threads of reasoning, interweave them, merge them, split them again, etc. Logic constitutes a first principle of utmost generality, actually I cannot imagine anything more general. Identity is not equivalent to equivalence, equivalence is a quotient of identity, consisting of two classes: Those values which are identical to True, and those which are not.

blatant303 1156 days ago

> Identity is not equivalent to equivalence

When talking about identity/equivalence of types in the context of homotopy type theory, yes. This is literally what the univalence axiom states.

Auggierose, I'm curious about your thoughts on how we can provide more rigor to LLMs when it comes to large-scale program transformations and proof synthesis. Given the complexity and versatility of these systems, what kind of foundational framework do you believe would enable GPT and similar models to synthesize and execute proofs rigorously? How can we ensure that they are both reliable and adaptable while dealing with various mathematical and logical domains?

More importantly, how whould this relate to NLP tasks such as: alright, the story is good, but can you rewrite it in the style of Auggierose ?

auggierose 1156 days ago

I am not a fan of HOTT, as nobody managed to explain to me its supposed advantages in terms that didn't border on mysticism.

Anyway, your question is very interesting! :-)