Hacker News new | ask | show | jobs
by nologic01 1158 days ago
Brute force approaches always hit some wall. ML will be no different. In the decades to come it us quite likely that algorithms will develop in directions orthogonal to current approaches. The idea that you improve performance by throwing gazillions of data into gargantuan models might be even come to be seen as laughable.

Keep in mind (pun) that the only real intelligence here is us, and we are pretty good at figuring out when a tool has exhausted its utility.

2 comments

We won't hit the wall.

Somewhat counterintuitively, scaling datasets is the lazy and economical approach. If you have the compute already, might as well dig an OOM more text tokens.

But there are other sources of data, and slightly different ways to utilize it. Multimodality, in very large training runs, will almost inevitably increase sample efficiency (for obvious reasons of context richness), synthetic data is already very effective [1], and there are and will be discovered other ways to do more in the condition of diminishing raw text resources. But a thorough abandonment of the scaling strategy is very unlikely.

Sutton's Bitter Lesson [2] points at a very powerful rule of thumb: we shouldn't turn AI engineering into a contest of smartness, we should allow complex smartness to emerge from generic low-level algorithms. What will be seen as laughable in decades to come is not the scaling strategy, but the Godlike conceit of people who thought they can devise generally applicable rules of reasoning from first principles.

1: https://arxiv.org/abs/2304.08466 2: http://www.incompleteideas.net/IncIdeas/BitterLesson.html

I don't think your "synthetic data on ImageNet" reference shows "synthetic data is already very effective". Since many people won't read the paper, here's what it says:

Training ResNet-50 on real ImageNet gives 73.09% top-1 accuracy, while training it on synthetic data (same resolution, same number of images) generated by this work gives 64.96%, which is SOTA compared to previous work's 63.02%. Therefore, synthetic data is worse than real data for now.

But synthetic data is not useless, because training on real data plus synthetic data is a bit better than both real data and synthetic data. (Accuracy here is different due to different methodology.) Using 1:1 real data and synthetic data improves accuracy from 76.39% to 77.61%. But using 1:2 is worse than 1:1 (77.16%), even if dataset became 50% larger. With 1:4, result is worse than not using synthetic data at all. So synthetic data at best can enlarge dataset by 5x, more likely just 2x.

I wonder how much you can improve that scaling factor by using data augmentation techniques (noise, rescaling, recropping, rotation, changing colors, using normal maps, etc).
You are masquerading personal preferences (and possibly professional interests) as rules of nature. If anything, Godlike conceit definetely applies to some ML accolytes.

In any case, with your last point "we should allow complex smartness to emerge" you essentially agree with my point that new levels will emerge from orthogonal (new) directions.

The good thing about brute force is that it summons so many resources it primes the way for smarter approaches.

For those not conceited the objective is not some deus-ex-machina but "algorithms that work".

It is interesting that you don't even hide having strong personal emotional preference at stake. Now, does this not suggest that your predictions are a priori less credible, by your logic?

No, I don't think "orthogonal" directions will be fruitful.

I also disagree on evaluations. What you call brute search is not brute search at all, nor a deux ex machina, it is a lawful and honest method of algorithmic discovery of true regularities. "Smarter approaches", meanwhile, usually amount to stilted expressions of narcissism of researchers overly proud with having come up with shallow tricks aping some aspect of explicit human reasoning. They're not actually smart, nor do they work far outside of the toy distribution for which they were developed.

But we can devise generally applicable rules of reasoning from first principles. It's called logic. I am pretty sure the next step is to properly combine machine learning and logic properly.
Seems unlikely, that never worked in the past. And humans don't actually use logic (especially formal logic) to come up with anything. They just use it to justify what they came up with.

Not even mathematicians think in terms of logic when trying to solve problems.

Of course mathematicians also think in terms of logic. It’s what you learn when you study mathematics, you soak it up automatically, although few study logic explicitly. And before 2015 a machine beating worlds best go player also seemed pretty unlikely.
I've studied mathematics.

You only do (formal) logic as an afterthought when communicating your proofs to other people or writing them down. Otherwise it's mostly intuition.

I've studied mathematics, too. Yes, formal logic is an afterthought when you do mathematics. But formal logic is just an explicit representation of what goes on internally in a mathematician. Or at least that's how I approach formal logic (most logicians don't). I would describe these internal processes inside a mathematician (and outside, when used for communication) as intuition + "logic to keep intuition in check". Sounds like ML + logic to me.
There are already tons of systems (for example Google Translate) that combine rule-based reasoning with probabilistic reasoning. Looks to be working to me.
Interesting. Do you have any sources on Google Translate using rule-based reasoning?
Machine Translation, by Thierry Poibeau, 2017.
First principles don't work in the space of systems geared towards extreme generalization such as LLMs. You need to be ready to compare anything with anything and build bridges between many principles. In fact there is a deep link between the progress of structuralism in mathematics culminating with homotopy type theory and its parallel (r)evolution in the humanities with the discovery of manuscripts by the founder of structural linguistics, Ferdinand de Saussure.

Identity is what provides the irreducible basis, in the sense that we cannot enter into the consideration of specific facts that are placed under this identity, and it is this identity that becomes for us the true concrete fact, beyond which there is nothing more.

...

For example, for a musical composition, compared to a painting. Where does a musical composition exist? It is the same question as to know where 'aka' exists. In reality, this composition only exists when it is performed; but to consider this performance as its existence is false. Its existence is the identity of the performances.

...

For each of the things we have considered as a truth, we have arrived through so many different paths that we confess we do not know which one should be preferred. To properly present the entirety of our propositions, it would be necessary to adopt a fixed and defined starting point. But what we are trying to establish is that it is false to admit in linguistics a single fact as defined in itself. There is, therefore, a necessary absence of any starting point, and if some reader is willing to follow our thoughts carefully from one end to the other of this volume, they will recognize, we are convinced, that it was, so to speak, impossible to follow a very rigorous order. We will allow ourselves to present, up to three or four times in different forms, the same idea to the reader because there really is no starting point more appropriate than another on which to base the demonstration.

...

As language offers no substance under any of its manifestations, but only combined or isolated actions of physiological, physical, and mental forces, and as nevertheless all our distinctions, our terminology, and all our ways of speaking are based on this involuntary assumption of a substance, we cannot refuse, first and foremost, to recognize that the most essential task of the theory of language will be to untangle what our primary distinctions are all about.

...

There are different types of identity. This is what creates different orders of linguistic facts. Outside of any identity relationship, a linguistic fact does not exist. However, the identity relationship depends on a variable point of view that one decides to adopt; therefore, there is no rudiment of a linguistic fact outside the defined point of view that presides over distinctions.

Source: http://www.revue-texto.net/docannexe/file/116/saussure255_6....

TL;DR: identity is equivalent to equivalence

There is no reason why logic cannot follow various different threads of reasoning, interweave them, merge them, split them again, etc. Logic constitutes a first principle of utmost generality, actually I cannot imagine anything more general. Identity is not equivalent to equivalence, equivalence is a quotient of identity, consisting of two classes: Those values which are identical to True, and those which are not.
> Identity is not equivalent to equivalence

When talking about identity/equivalence of types in the context of homotopy type theory, yes. This is literally what the univalence axiom states.

Auggierose, I'm curious about your thoughts on how we can provide more rigor to LLMs when it comes to large-scale program transformations and proof synthesis. Given the complexity and versatility of these systems, what kind of foundational framework do you believe would enable GPT and similar models to synthesize and execute proofs rigorously? How can we ensure that they are both reliable and adaptable while dealing with various mathematical and logical domains?

More importantly, how whould this relate to NLP tasks such as: alright, the story is good, but can you rewrite it in the style of Auggierose ?

I am not a fan of HOTT, as nobody managed to explain to me its supposed advantages in terms that didn't border on mysticism.

Anyway, your question is very interesting! :-)

AI had a winter of many decades because the hardware wasn't there and there were better alternatives, especially for neural nets. Now ChatGPT etc comes out, with unbelievable results, decades in the making. And a couple months we're already writing it off because of the next limitation? Maybe let's give it more than a month or two to figure out if we even need all that data. I heard they're already talking about trying to significantly reduce the model hyper parameters size even though a large model size increase apparently the reason GPT 4 was so much better than 3. Give it a minute IMHO before making generalizations like this so soon
Well I imagine the commenter actually understands the domain, the techniques, and is making an informed opinion.

It is possible to form opinions by knowing the domain, rather than drawing an exponential curve of newspaper headlines which trails off "..."