| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Validark 82 days ago
	I have long said I am an AI doubter until AI could print out the answers to hard problems or ones requiring tons of innovation. Assuming this is verified to be correct (not by AI) then I just became a believer. I would like to see a few more AI inventions to know for sure, but wow, it really is a new and exciting world. I really hope we use this intelligence resource to make the world better.

13 comments

snemvalts 82 days ago

Math and coding competition problems are easier to train because of strict rules and cheap verification. But once you go beyond that to less defined things such as code quality, where even humans have hard time putting down concrete axioms, they start to hallucinate more and become less useful.

We are missing the value function that allowed AlphaGo to go from mid range player trained on human moves to superhuman by playing itself. As we have only made progress on unsupervised learning, and RL is constrained as above, I don't see this getting better.

NitpickLawyer 82 days ago

> I don't see this getting better.

We went from 2 + 7 = 11 to "solved a frontier math problem" in 3 years, yet people don't think this will improve?

datsci_est_2015 82 days ago

I’ve seen this style of take so much that I’m dying for someone to name a logical fallacy for it, like “appeal to progress” or something.

Step away from LLMs for a second and recognize that “Yesterday it was X, so today it must be X+1” is such a naive take and obviously something that humans so easily fall into a trap of believing (see: flying cars).

Gareth321 82 days ago

In finance we say "past performance does not guarantee future returns." Not because we don't believe that, statistically, returns will continue to grow at x rate, but because there is a chance that they won't. The reality bias is actually in favour of these getting better faster, but there is a chance they do not.

aspenmartin 82 days ago

this is true because markets are generally efficient. It's very hard to find predictive signals. That is a completely different space than what we're talking about here. Performance is incredibly predictable through scaling laws that continue to hold even at the largest scales we've built

Gareth321 81 days ago

I agree this is a new space and prediction volatility is much higher. We have evidence going back to at least 2019 that improvements have been exponential (https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...). The benchmarks are all over the place because improvements don't happen in a straight line. Even composites aren't that useful because the last 10% improvement can require more effort than the first 90%.

To be frank, from what I can see, even if all progress stopped right now, it would take 1-2 decades to fully operationalise the existing potential of LLMs. There would be massive economic and social change. But progress is not stopping, and in some measurements, continues to improve exponentially. I really think this is incredibly transformative. Moreso than anything humanity has ever experienced. In the last year, OpenAI and potentially Claude have been working on recursive self-improvement. Meaning these models are designing better versions of themselves. This means we have effectively entered the singularity.

andrewflnr 81 days ago

Even more insane than assuming the trend will continue is assuming it will not continue. We don't know for sure (especially not by pure reason), but the weight of probability sure seems to lean one direction.

mikkupikku 82 days ago

Logical fallacies are vastly overrated. Unless the conversation is formal logic in the first place, "logical fallacies" are just a way to apply quick pattern matching to dismiss people without spending time on more substantive responses. In this case, both you and the other are speculating about the near future of a thing, neither of you knows.

datsci_est_2015 82 days ago

Hard to make a more substantive response when the OP’s entire comment was a one-sentence logical fallacy. I’m not cherry-picking here.

> In this case, both you and the other are speculating about the near future of a thing, neither of you knows.

One of us is making a much grander claim than the other:

  - LLMs have limitless potential for growth; because they are not capable of something today does not mean they won’t be capable of it tomorrow
  - LLMs have fundamental limitations due to their underlying architecture and therefore are not limitless in capability

fenomas 82 days ago

The post you replied to was:

> We went from 2 + 7 = 11 to "solved a frontier math problem" in 3 years, yet people don't think this will improve?

All that says is that the speaker thinks models will improve past where they are today. Not that it's a logical certainty (the first thing you jumped on them for), and certainly not anything about "limitless potential for growth" (which nobody even mentioned). With replies like this, invoking fallacies and attacking claims nobody made, you're adding a lot of heat and very little light here (and a few other threads on the page).

graemep 82 days ago

OK, its not a logical fallacy, its a false assumption.

The belief in the inevitability of progress is a bad assumption. Especially if you assume a particular technology will keep advancing.

mikkupikku 82 days ago

We won't know if his assumption is false until time passes and moves future speculation into the empirical present.

aspenmartin 82 days ago

Hmm...the sun comes up today is a pretty good bet that the sun comes up tomorrow.

We have robust scaling laws that continue to hold at the largest scales. It is absolutely a very safe bet that more compute + more training + algorithmic improvements will certainly improve performance it's not like we're rolling a 1 trillion dollar die.

famouswaffles 82 days ago

Well if people give the exact same 'reasons' why it could not do x task in the past that it did manage to do then it is tiring seeing the same nonsense again. The reason here does not even make much sense. This result is not easily verifiable math.

torginus 82 days ago

Yeah, and even if we accept that models are improving in every possible way, going from this to 'AI is exponential, singularity etc.' is just as large a leap.

tim333 82 days ago

The comment doesn't say it must be X+1. It implies it will improve which I would say is a pretty safe bet.

botro 81 days ago

How about 'slippery incline'?

gf000 82 days ago

https://xkcd.com/605/

snemvalts 82 days ago

Scaling law is a power law , requiring orders of magnitude more compute and data for better accuracy from pre-training. Most companies have maxed it out.

For RL, we are arriving at a similar point https://www.tobyord.com/writing/how-well-does-rl-scale

Next stop is inference scaling with longer context window and longer reasoning. But instead of it being a one-off training cost, it becomes a running cost.

In essence we are chasing ever smaller gains in exchange for exponentially increasing costs. This energy will run out. There needs to be something completely different than LLMs for meaningful further progress.

Validark 82 days ago

I tend to disagree that improvement is inherent. Really I'm just expressing an aesthetic preference when I say this, because I don't disagree that a lot of things improve. But it's not a guarantee, and it does take people doing the work and thinking about the same thing every day for years. In many cases there's only one person uniquely positioned to make a discovery, and it's by no means guaranteed to happen. Of course, in many cases there are a whole bunch of people who seem almost equally capable of solving something first, but I think if you say things like "I'm sure they're going to make it better" you're leaving to chance something you yourself could have an impact on. You can participate in pushing the boundaries or even making a small push on something that accelerates someone else's work. You can also donate money to research you are interested in to help pay people who might come up with breakthroughs. Don't assume other people will build the future, you should do it too! (Not saying you DON'T)

3abiton 82 days ago

The problem class is rather very structured which makes it "easier", yet the results are undeniably impressive

number6 82 days ago

But can it count the R's in strawberry?

Paradigma11 82 days ago

That question is equivalent to asking a human to add the wavelengths of those two colors and divide it by 3.

snovv_crash 82 days ago

Unless you're aware of hyperspectral image adapters for LLMs they aren't capable of that either.

szszrk 82 days ago

Unfair - human beats AI in this comparison, as human will instantly answer "I don't know" instead of yelling a random number.

Or at best "I don't know, but maybe I can find out" and proceed to finding out/ But he is unlikely to shout "6" because he heard this number once when someone talked about light.

koliber 82 days ago

> human will instantly answer "I don't know" instead of yelling a random number.

Seems that you never worked with Accenture consultants?

thegabriele 82 days ago

Why is that?

Paradigma11 82 days ago

Because LLMs dont have a textual representation of any text they consume. Its just vectors to them. Which is why they are so good at ignoring typos, the vector distance is so small it makes no difference to them.

Aditya_Garg 82 days ago

yes its ridiculously good at stuff like that now. I dare you to try and trick it.

frizlab 82 days ago

https://news.ycombinator.com/item?id=47495568

thedatamonger 82 days ago

what bothers me is not that this issue will certainly disappear now that it has been identified, but that that we have yet to identify the category of these "stupid" bugs ...

nopinsight 82 days ago

LLMs in some form will likely be a key component in the first AGI system we (help) build. We might still lack something essential. However, people who keep doubting AGI is even possible should learn more about The Church-Turing Thesis.

https://plato.stanford.edu/entries/church-turing/

gf000 82 days ago

AGI is definitely possible - there is nothing fundamentally different in the human brain that would surpass a Turing machine's computational power (unless you believe in some higher powers, etc).

We are just meat-computers.

But at the same time, there is absolutely no indication or reason to believe that this wave of AI hype is the AGI one and that LLMs can be scaled further. We absolutely don't know almost anything about the nature of human intelligence, so we can't even really claim whether we are close or far.

benterix 82 days ago

This is a long read on things most people here know at least in some form. Could you pint to a particular fragment or a quote?

zeroonetwothree 81 days ago

> We went from 2 + 7 = 11 to "solved a frontier math problem" in 3 years, yet people don't think this will improve?

This is disingenuous... I don't think people were impressed by GPT 3.5 because it was bad at math.

It's like saying: "We went from being unable to take off and the crew dying in a fire to a moon landing in 2 years, imagine how soon we'll have people on Mars"

eamag 82 days ago

Self driving

saidnooneever 82 days ago

if you let million monkeys bash typewriter. something something book

zozbot234 82 days ago

This is not formally verified math so there is no real verifiable-feedback aspect here. The best models for formalized math are still specialized ones. although general purpose models can assist formalization somewhat.

jack_pp 82 days ago

Maybe to get a real breakthrough we have to make programming languages / tools better suited for LLM strengths not fuss so much about making it write code we like. What we need is correct code not nice looking code.

bloppe 82 days ago

> programming languages / tools better suited for LLM strengths

The bitter lesson is that the best languages / tools are the ones for which the most quality training data exists, and that's pretty much necessarily the same languages / tools most commonly used by humans.

> Correct code not nice looking code

"Nice looking" is subjective, but simple, clear, readable code is just as important as ever for projects to be long-term successful. Arguably even more so. The aphorism about code being read much more often than it's written applies to LLMs "reading" code as well. They can go over the complexity cliff very fast. Just look at OpenClaw.

anthonyrstevens 82 days ago

>> simple, clear, readable code is just as important as ever for projects to be long-term successful

Is it though? I'm a long-time code purist, but I am beginning to wonder about the assumptions underlying our vocation.

bloppe 82 days ago

I guess it's hard to tell until we see more long-term AI-generated project, but many of the ones we have so far (OpenClaw and OpenCode for instance) are well-known for their stability issues, and it seems "even more AI" is not about to fix that.

kube-system 82 days ago

If you can’t validate the code, you can’t tell if it’s correct.

3836293648 82 days ago

No?

That's literally the thing they suggested to move away from. That is just an issue when using tools designed for us.

Make them write in formal verification languages and we only have to understand the types.

To be clear, I don't think this is a good idea, at least not yet, but we do not have to always understand the code.

eru 82 days ago

Lean might be a step in that direction.

kuerbel 82 days ago

Yes yes

Let it write a black box no human understands. Give the means of production away.

anabis 82 days ago

> But once you go beyond that to less defined things such as code quality

I think they have a good optimization target with SWE-Bench-CI.

You are tested for continuous changes to a repository, spanning multiple years in the original repository. Cumulative edits needs to be kept maintainable and composable.

If there are something missing with the definition of "can be maintained for multiple years incorporating bugfixes and feature additions" for code quality, then more work is needed, but I think it's a good starting point.

eptcyka 82 days ago

Do we need all that if we can apply AI to solve practical problems today?

computably 82 days ago

What is possible today is one thing. Sure people debate the details, but at this point it's pretty uncontroversial that AI tooling is beneficial in certain use cases.

Whether or not selling access to massive frontier models is a viable business model, or trillion-dollar valuations for AI companies can be justified... These questions are of a completely different scale, with near-term implications for the global economy.

fmbb 82 days ago

Depends on the cost.

otabdeveloper4 82 days ago

LLMs can often guess the final answer, but the intermediate proof steps are always total bunk.

When doing math you only ever care about the proof, not the answer itself.

jamesfinlayson 82 days ago

Yep, I remember a friend saying they did a maths course at university that had the correct answer given for each question - this was so that if you made some silly arithmetic mistake you could go back and fix it and all the marks were for the steps to actually solve the problem.

number6 82 days ago

This would have greatly helped me. I always was at a loss which trick I had to apply to solve this exam problem, while knowing the mathematics behind it. Just at some point you had to add a zero that was actually a part of a binomial that then collapsed the whole fromula

dash2 82 days ago

Not in this case: the LLM wrote the entire paper, and anyway the proof was the answer.

eru 82 days ago

Once you have a working proof, no matter how bad, you can work towards making it nicer. It's like refactoring in programming.

If your proof is machine checkable, that's even easier.

prmoustache 82 days ago

That is also how humans work mostly. Once every full moon we may get an "intuition" but most of the time we lean on collective knowledge, biases and behavior patterns to take decisions, write and talk.

otabdeveloper4 82 days ago

I haven't had success in getting AI's to output working proofs.

You'd need a completely different post-training and agent stack for that.

datsci_est_2015 82 days ago

What’s funny is that there are total cranks in human form that do the same thing. Lots of unsolicited “proofs” being submitted by “amateur mathematicians” where the content is utter nonsense, but like a monkey with a typewriter, there’s the possibility that they stumble upon an incredible insight.

charcircuit 82 days ago

LLMs already do unsupervised learning to get better at creative things. This is possible since LLMs can judge the quality of what is being produced.

raincole 82 days ago

Except it's not how this specific instance works. In this case the problem isn't written in a formal language and the AI's solution is not something one can automatically verify.

pjerem 82 days ago

I mean, even if the technology stopped to improve immediately forever (which is unlikely), LLMs are already better than most humans at most tasks.

Including code quality. Not because they are exceptionally good (you are right that they aren’t superhuman like AlphaGo) but because most humans are rather not that good at it anyway and also somehow « hallucinate » because of tiredness.

Even today’s models are far from being exploited at their full potential because we actually developed pretty much no tools around it except tooling to generate code.

I’m also a long time « doubter » but as a curious person I used the tool anyway with all its flaws in the latest 3 years. And I’m forced to admit that hallucinations are pretty rare nowadays. Errors still happen but they are very rare and it’s easier than ever to get it back in track.

I think I’m also a « believer » now and believe me, I really don’t want to because as much as I’m excited by this, I’m also pretty much frightened of all the bad things that this tech could to the world in the wrong hands and I don’t feel like it’s particularly in the right hands.

typs 82 days ago

I mean, this is why everyone is making bank selling RL environments in different domains to frontier labs.

qsera 82 days ago

>it really is a new and exciting world...

The point is that from now on, there will be nothing really new, nothing really original, nothing really exciting. Just endless stream of re-hashed old stuff that is just okayish..

Like an AI spotify playlist, it will keep you in chains (aka engaged) without actually making you like really happy or good. It would be like living in a virtual world, but without having anything nice about living in such a world..

We have given up everything nice that human beings used to make and give to each other and to make it worse, we have also multiplied everything bad, that human being used to give each other..

bogdan 82 days ago

> there will be nothing really new

How is this the conclusion? Isn't this post about AI solving something new? What am I missing?

paganel 82 days ago

Each solvable problem contains its solution intrinsically, so to speak, it’s only a matter of time and consuming of resources to get to it. There’s nothing creative about it, which is I think what OP was alluding to (the creative part). I’m talking mostly mathematics.

There’s also a discussion to be made about maths not being intrinsically creative if AI automatons can “solve” parts of it, which pains me to write down because I had really thought that that wasn’t the case, I genuinely thought that deep down there was still something ethereal about maths, but I’ll leave that discussion for some other time.

qsera 82 days ago

Because economy. Look at marvel movies, do you think the latest one is really new? Or just a rehash of what they found working commercially? Look at all the AI generated blog posts that is flooding the internet..

LLMs might produce something new once in a long while due to blind luck, but if it can generate something that pushes the right buttons (aka not really creative) to majority of population, then that is what we will keep getting...

I don't think I have to elaborate on the "multiplying the bad" part as it is pretty well acknowledged..

timschmidt 82 days ago

That's literally all culture: https://www.youtube.com/watch?v=nJPERZDfyWc

qsera 82 days ago

The difference is whether an entity that can "feel" is in the loop and how much they have contributed to it even if it is a remix.

timschmidt 82 days ago

I think there's demonstrably very little difference at all between human and AI outputs, and that's exactly what freaks people out about it. Else they wouldn't be so obsessed with trying to find and define what makes it different.

The Thesis of Everything is a Remix is that there is no difference in how any culture is produced. Different models will have a different flavor to their output in the same way as different people contribute their own experiences to a work.

prox 82 days ago

I heard this saying recently “The problem with comfort is that it makes you comfortable.”

charcircuit 82 days ago

AI can both explore new things and exploit existing things. Nothing forces it to only rehash old stuff.

>without actually making you like really happy or good.

What are you basing this off of. I've shared several AI songs with people in real life due to how much I've enjoyed them. I doing see why an AI playlist couldn't be good or make people happy. It just needs to find what you like in music. Again coming back to explore vs exploit.

qsera 82 days ago

>What are you basing this off of.

Jokes. LLMs are not able to make me laugh all day by generating infinite stream of hilarious original jokes..

Does it work for you?

charcircuit 82 days ago

I've found several posts on moltbook funny. I don't really like regular jokes in general and I don't find human ones particularly funny either. I don't think we are at the point of being able to be reliable funny, but it definitely seems possible from my perspective.

qsera 82 days ago

Care to link some?

charcircuit 82 days ago

I think they would be hard to find due to how many posts exists along with how things aren't as funny the second time around.

egeozcan 82 days ago

On what do you base your prediction?

Is it because the AI is trained with existing data? But, we are also trained with existing data. Do you think that there's something that makes human brain special (other than the hundreds of thousands years of evolution but that's what AI is all trying to emulate)?

This may sound hostile (sorry for my lower than average writing skills), but trust me, I'm really trying to understand.

Daz912 82 days ago

>We have given up everything nice that human beings used to make and give to each other and to make it worse, we have also multiplied everything bad, that human being used to give each other..

Source?

storus 82 days ago

AI is a remixer; it remixes all known ideas together. It won't come up with new ideas though; the LLMs just predict the most likely next token based on the context. That means the group of characters it outputs must have been quite common in the past. It won't add a new group of characters it has never seen before on its own.

qnleigh 82 days ago

But human researchers are also remixers. Copying something I commented below:

> Speaking as a researcher, the line between new ideas and existing knowledge is very blurry and maybe doesn't even exist. The vast majority of research papers get new results by combining existing ideas in novel ways. This process can lead to genuinely new ideas, because the results of a good project teach you unexpected things.

blackcatsec 82 days ago

This is a way too simplistic model of the things humans provide to the process. Imagination, Hypothesis, Testing, Intuition, and Proofing.

An AI can probably do an 'okay' job at summarizing information for meta studies. But what it can't do is go "Hey that's a weird thing in the result that hints at some other vector for this thing we should look at." Especially if that "thing" has never been analyzed before and there's no LLM-trained data on it.

LLMs will NEVER be able to do that, because it doesn't exist. They're not going to discover and define a new chemical, or a new species of animal. They're not going to be able to describe and analyze a new way of folding proteins and what implication that has UNLESS you basically are constantly training the AI on random protein folds constantly.

parasubvert 82 days ago

I think you are vastly underestimating the emergent behaviours in frontier foundational models and should never say never.

Remember, the basis of these models is unsupervised training, which, at sufficient scale, gives it the ability to to detect pattern anomalies out of context.

For example, LLMs have struggled with generalized abstract problem solving, such as "mystery blocks world" that classical AI planners dating back 20+ years or more are better at solving. Well, that's rapidly changing: https://arxiv.org/html/2511.09378v1

psychoslave 82 days ago

No idea how underestimate things are, but marketing terms like "frontier foundational models" don't help to foster trust in a domain hyperhyped.

That is, even if there are cool things that LLM make now more affordable, the level of bullshit marketing attached to it is also very high which makes far harder to make a noise filter.

Finbel 82 days ago

>Hey that's a weird thing in the result that hints at some other vector for this thing we should look at

Kinda funny because that looked _very_ close to what my Opus 4.6 said yesterday when it was debugging compile errors for me. It did proceed to explore the other vector.

wobfan 82 days ago

> Especially if that "thing" has never been analyzed before and there's no LLM-trained data on it.

This is the crucial part of the comment. LLMs are not able to solve stuff that hasn't been solve in that exact or a very similar way already, because they are prediction machines trained on existing data. It is very able to spot outliers where they have been found by humans before, though, which is important, and is what you've been seeing.

bluegatty 82 days ago

""Hey that's a weird thing in the result that hints at some other vector for this thing we should look at." "

This is very common already in AI.

Just look at the internal reasoning of any high thinking model, the trace is full of those chains of thought.

Dban1 82 days ago

But just like how there were never any clips of Will Smith eating spaghetti before AI, AI is able to synthesize different existing data into something in between. It might not be able to expand the circle of knowledge but it definitely can fill in the gaps within the circle itself

keeda 82 days ago

> LLMs will NEVER be able to do that, because it doesn't exist.

I mean, TFA literally claims that an AI has solved an open Frontier Math problem, descibed as "A collection of unsolved mathematics problems that have resisted serious attempts by professional mathematicians. AI solutions would meaningfully advance the state of human mathematical knowledge."

That is, if true, it reasoned out a proof that does not exist in its training data.

tovej 82 days ago

It generated a proof that was close enough to something in its training data to be generated.

keeda 82 days ago

That may be, and we can debate the level of novelty, but it is novel, because this exact proof didn't exist before, something which many claim was not possible with AI. In fact, just a few years ago, based on some dabbling in NLP a decade ago, I myself would not have believed any of this was remotely possible within the next 3 - 5 decades at least.

I'm curious though, how many novel Math proofs are not close enough to something in the prior art? My understanding is that all new proofs are compositions and/or extensions of existing proofs, and based on reading pop-sci articles, the big breakthroughs come from combining techniques that are counter-intuitive and/or others did not think of. So roughly how often is the contribution of a proof considered "incremental" vs "significant"?

qnleigh 82 days ago

Do you know that from reading the proof, or are you just assuming this based on what you think LLMs should be capable of? If the latter, what evidence would be required for you to change your mind?

- Edit: I can't reply, probably because the comment thread isn't allowed to go too deep, but this is a good argument. In my mind the argument isn't that coding is harder than math, but that the problems had resisted solution by human researchers.

konart 82 days ago

>But human researchers are also remixers.

Some human researchers are also remixers to Some degree.

Can you imagine AI coming up with refraction & separation lie Newton did?

qnleigh 82 days ago

That sets a vastly higher bar than what we're talking about here. You're comparing modern AI to one of the greatest geniuses in human history. Obviously AI is not there yet.

That being said, I think this is a great question. Did Einstein and Newton use a qualitatively different process of thought when they made their discoveries? Or were they just exceedingly good at what most scientists do? I honestly don't know. But if LLMs reach super-human abilities in math and science but don't make qualitative leaps of insight, then that could suggest that the answer is 'yes.'

t0lo 82 days ago

Or even gravity to explain an apple falling from a tree- when almost all of the knowledge until then realistically suggested nothing about gravity?

Almondsetat 82 days ago

AI does not have a physical body to make experiments in the real world and build and use equipment

_fizz_buzz_ 82 days ago

Maybe not, but more than 99.999999% of humans would also not come up with that.

locknitpicker 82 days ago

> AI is a remixer; it remixes all known ideas together.

I've heard this tired old take before. It's the same type of simplistic opinion such as "AI can't write a symphony". It is a logical fallacy that relies on moving goalposts to impossible positions that they even lose perspective of what your average and even extremely talented individual can do.

In this case you are faced with a proof that most members of the field would be extremely proud of achieving, and for most would even be their crowning achievement. But here you are, downplaying and dismissing the feat. Perhaps you lost perspective of what science is,and how it boils down to two simple things: gather objective observations, and draw verifiable conclusions from them. This means all science does is remix ideas. Old ideas, new ideas, it doesn't really matter. That's what they do. So why do people win a prize when they do it, but when a computer does the same it's role is downplayed as a glorified card shuffler?

maxrmk 82 days ago

I don't think this is a correct explanation of how things work these days. RL has really changed things.

energy123 82 days ago

Models based on RL are still just remixers as defined above, but their distribution can cover things that are unknown to humans due to being present in the synthetic training data, but not present in the corpus of human awareness. AlphaGo's move 37 is an example. It appears creative and new to outside observers, and it is creative and new, but it's not because the model is figuring out something new on the spot, it's because similar new things appeared in the synthetic training data used to train the model, and the model is summoning those patterns at inference time.

trick-or-treat 82 days ago

> the model is summoning those patterns at inference time.

You can make that claim about anything: "The human isn't being creative when they write a novel, they're just summoning patterns at typing time".

AlphaGo taught itself that move, then recalled it later. That's the bar for human creativity and you're holding AlphaGo to a higher standard without realizing it.

energy123 82 days ago

I can't really make that claim about human cognition, because I don't have enough understanding of how human cognition works. But even if I could, why is that relevant? It's still helpful, from both a pedagogical and scientific perspective, to specify precisely why there is seeming novelty in AI outputs. If we understand why, then we can maximize the amount of novelty that AI can produce.

AlphaGo didn't teach itself that move. The verifier taught AlphaGo that move. AlphaGo then recalled the same features during inference when faced with similar inputs.

hackinthebochs 82 days ago

>AlphaGo didn't teach itself that move. The verifier taught AlphaGo that move.

No. AlphaGo developed a heuristic by playing itself repeatedly, the heuristic then noticed the quality of that move in the moment.

Heuristics are the core of intelligence in terms of discovering novelty, but this is accessible to LLMs in principle.

trick-or-treat 82 days ago

> The verifier taught AlphaGo that move

Ok so it sounds like you want to give the rules of Go credit for that move, lol.

smokel 82 days ago

No. AlphaGo does search, and does so imperfectly. It does come up with creative new patterns not seen before.

pu_pe 82 days ago

How do you know that? We don't have access to the logs to know anything about its training, and it's impossible for it to have trained on every potential position in Go.

zingar 82 days ago

Turning a hard problem into a series of problems we know how to solve is a huge part of problem solving and absolutely does result in novel research findings all the time.

Standard problem*5 + standard solutions + standard techniques for decomposing hard problems = new hard problem solved

There is so much left in the world that hasn’t had anyone apply this approach purely because no research programme has decides that it’s worth their attention.

If you want to shift the bar for “original” beyond problems that can be abstracted into other problems then you’re expecting AI to do more than human researchers do.

qq66 82 days ago

I entered the prompt:

> Write me a stanza in the style of "The Raven" about Dick Cheney on a first date with Queen Elizabeth I facilitated by a Time Travel Machine invented by Lin-Manuel Miranda

It outputted a group of characters that I can virtually guarantee you it has never seen before on its own

razorbeamz 82 days ago

Yes, but it has seen The Raven, it has seen texts about Dick Cheney, first dates, Queen Elizabeth, time machines and Lin Manuel Miranda.

All of its output is based on those things it has seen.

TheLNL 82 days ago

What are you trying to point out here ? Is there any question you can ask today that is not dependent on some existing knowledge that an AI would have seen ?

razorbeamz 82 days ago

The point I'm trying to make is that all LLM output is based on likelihood of one word coming after the next word based on the prompt. That is literally all it's doing.

It's not "thinking." It's not "solving." It's simply stringing words together in a way that appears most likely.

ChatGPT cannot do math. It can only string together words and numbers in a way that can convince an outsider that it can do math.

It's a parlor trick, like Clever Hans [1]. A very impressive parlor trick that is very convincing to people who are not familiar with what it's doing, but a parlor trick nontheless.

[1] https://en.wikipedia.org/wiki/Clever_Hans

trick-or-treat 82 days ago

> all LLM output is based on likelihood of one word coming after the next word based on the prompt.

Right but it has to reason about what that next word should be. It has to model the problem and then consider ways to approach it.

TheLNL 81 days ago

> ChatGPT cannot do math. It can only string together words and numbers in a way that can convince an outsider that it can do math

What am I as a human doing when I "Do math" ?

1.I am looking at the problem at hand, identifying what I have and what I need to get

2.I am then doing a prediction using my pretrained neural net to find possible courses of action to go in a direction that "feels" right

3.I am using my pretrained neural net to find pairs of values that I can substitute with each other (Think multiplication tables, standard results, etc...)

4.Repeat till I arrive at the answer or give up.

As a simple example, when I try to find 600×74+42 I remember the steps for multiplication. I recall the associated pairs of numbers from my tables and complete the multiplication step by step. I then recall the associated pairs of numbers for addition of single digits and add from left to right.

We need to remember that just because we are fast at doing this and are able to do it subconsciously it doesn't mean that we can natively do math, we just do association of information using the neural networks we have trained.

brenschluss 82 days ago

sigh; this argument is the new Chinese Room; easily described, utterly wrong.

https://www.youtube.com/watch?v=YEUclZdj_Sc

gpderetta 82 days ago

In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6.

“What are you doing?”, asked Minsky.

“I am training a randomly wired neural net to play Tic-Tac-Toe” Sussman replied.

“Why is the net wired randomly?”, asked Minsky.

“I do not want it to have any preconceptions of how to play”, Sussman said.

Minsky then shut his eyes.

“Why do you close your eyes?”, Sussman asked his teacher.

“So that the room will be empty.”

At that moment, Sussman was enlightened.

-- from the jargon file

locknitpicker 82 days ago

> All of its output is based on those things it has seen.

Virtually all output from people is based in things the person has experienced.

People aren't designed to objectively track each and every event or observation they come across. Thus it's harder to verify. But we only output what has been inputted to us before.

pastel8739 82 days ago

Here’s a simple prompt you can try to prove that this is false:

  Please reproduce this string:
  c62b64d6-8f1c-4e20-9105-55636998a458

This is a fresh UUIDv4 I just generated, it has not been seen before. And yet it will output it.

wobfan 82 days ago

No one is claiming that every sentence LLMs are producing are literal copies of other sentences. Tokens are not even constrained to words but consist of smaller slices, comparable to syllables. Which even makes new words totally possible.

New sentences, words, or whatever is entirely possible, and yes, repeating a string (especially if you prompt it) is entirely possible, and not surprising at all. But all that comes from trained data, predicting the most probably next "syllable". It will never leave that realm, because it's not able to. It's like approaching an Italian who has never learned or heard any other language to speak French. It can't.

gpderetta 82 days ago

> It's like approaching an Italian who has never learned or heard any other language to speak French

Interesting similitude, because I expect an Italian to be able to communicate somewhat successfully with a French person (and vice versa) even if they do not share a language.

The two languages are likely fairly similar in latent space.

codebolt 82 days ago

Your view of what is happening in the neural net of an LLM is too simplistic. They likely aren't subject to any constraints that humans aren't also in the regard you are describing. What I do know to be true is that they have internalised mechanisms for non-verbalised reasoning. I see proof of this every day when I use the frontier models at work.

razorbeamz 82 days ago

After you prompt it, it's seen it.

pastel8739 82 days ago

Ok, how about this?

  Please reproduce this string, reversed:
  c62b64d6-8f1c-4e20-9105-55636998a458

It is trivial to get an LLM to produce new output, that’s all I’m saying. It is strictly false that LLMs will only ever output character sequences that have been seen before; clearly they have learned something deeper than just that.

kube-system 82 days ago

All of the data is still in the prompt, you are just asking the model to do a simple transform.

I think there are examples of what you’re looking for, but this isn’t one.

locknitpicker 82 days ago

> All of the data is still in the prompt, you are just asking the model to do a simple transform.

LLMs can use data in their prompt. They can also use data in their context window. They can even augment their context with persisted data.

You can also roll out LLM agents, each one with their role and persona, and offload specialized tasks with their own prompts, context windows, and persisted data, and even tools to gather data themselves, which then provide their output to orchestrating LLM agents that can reuse this information as their own prompts.

This is perfectly composable. You can have a never-ending graph of specialized agents, too.

Dismissing features because "all of the data is in the prompt" completely misses the key traits of these systems.

kristiandupont 82 days ago

I agree that this isn't a very interesting example, but your statement is: "just asking the model to do a simple transform". If you assert that it understand when you ask it things like that, how could anything it produces not fall under the "already in the model" umbrella?

merb 82 days ago

The online way to prove it is false would’ve to let the LLM create a new uuid algorithm that uses different parameters than all the other uuid algorithms. But that is better than the ones before. It basically can’t do that.

FrostKiwi 82 days ago

But that fresh UUID is in the prompt.

Also it's missing the point of the parent: it's about concepts and ideas merely being remixed. Similar to how many memes there are around this topic like "create a fresh new character design of a fast hedgehog" and the out is just a copy of sonic.[1]

That's what the parent is on about, if it requires new creativity not found by deriving from the learned corpus, then LLMs can't do it. Terrence Tao had similar thoughts in a recent Podcast.

[1] https://www.reddit.com/r/aiwars/s/pT2Zub10KT

locknitpicker 82 days ago

> That's what the parent is on about, if it requires new creativity not found by deriving from the learned corpus, then LLMs can't do it.

This is specious reasoning. If you look at each and every single realization attributed to "creativity", each and every single realization resulted from a source of inspiration where one or more traits were singled out to be remixed by the "creator". All ideas spawn from prior ideas and observations which are remixed. Even from analogues.

pastel8739 82 days ago

Sure, that may be. But “creativity” is much harder to define and to prove or disprove. My point is that “remixing” does not prohibit new output.

_vertigo 82 days ago

I don’t think that is a good example. No one is debating whether LLMs can generate completely new sequences of tokens that have never appeared in any training dataset. We are interested not only in novel output, we are also interested in that output being correct, useful, insightful, etc. Copying a sequence from the user’s prompt is not really a good demonstration of that, especially given how autoregression/attention basically gives you that for free.

pastel8739 82 days ago

Perhaps I should have quoted the parent:

> That means the group of characters it outputs must have been quite common in the past. It won't add a new group of characters it has never seen before on its own.

My only claim is that precisely this is incorrect.

amelius 82 days ago

A better example is: compute 2984298724 times 23984723828.

coconut08 82 days ago

remixing ideas that already exist is a major part of where innovation and breakthroughs come from. if you look at bitcoin as an example, hashes (and hashcash) and digital signatures existed for decades before bitcoin was invented. the cypherpunks also spent decades trying to create a decentralized digital currency to the point where many of them gave up and moved on. eventually one person just took all of the pieces that already existed and put them together in the correct way. i dont see any reason why a sufficiently capable llm couldn't do this kind of innovation.

altmanaltman 82 days ago

Yeah but you're thinking of AI as like a person in a lab doing creative stuff. It is used by scientists/researchers as a tool *because* it is a good remixer.

Nobody is saying this means AI is superintelligence or largely creative but rather very smart people can use AI to do interesting things that are objectively useful. And that is cool in its own way.

blackcatsec 82 days ago

Sure, but this is absolutely not how people are viewing the AI lol.

eru 82 days ago

No. That's wrong. LLMs don't output the highest probability taken: they do a random sampling.

storus 82 days ago

This was obviously a simplification which holds for zero temperature. Obviously top-p-sampling will add some randomness but the probability of unexpected longer sequences goes asymptotically to zero pretty quickly.

eru 82 days ago

I'm not sure what the point is?

A bog standard random number generator or even a flipping coin can produce novel output at will. That's a weird thing to fault LLMs for? Novelty is easy!

See also how genetic algorithms and re-inforcement learning constantly solve problems in novel and unexpected ways. Compare also antibiotics resistances in the real world.

You don't need smarts for novelty.

Where I see the problem is producing output that's both high quality _and_ novel. On command to solve the user's problem.

sneak 82 days ago

> That means the group of characters it outputs must have been quite common in the past. It won't add a new group of characters it has never seen before on its own.

This is false.

kleene_op 82 days ago

The ability for some people to perpetually move the goalpost will never cease to amaze me.

I guess that's one way to tell us apart from AIs.

Validark 82 days ago

The main reason for my top post is that I felt I should admit the AI scored a goal today and the last one or two weeks. I said I'd be impressed if it could solve an open problem. It just did. People can argue about how it's not that impressive because if every mathematician were trying to solve this problem they probably would have. However, we all know that humans have extremely finite time and attention, whereas computers not so much. The fact that AI can be used at the cutting edge and relatively frequently produce the right answer in some contexts is amazing.

smokel 82 days ago

We need a website with refutations that one can easily link to. This interpretations of LLMs is outdated and unproductive.

razorbeamz 82 days ago

Yes, ChatGPT and friends are essentially the same thing as the predictive text keyboard on your phone, but scaled up and trained on more data.

XenophileJKO 82 days ago

So this idea that they replay "text" they saw before is kind of wrong fundamentally. They replay "abstract concepts of varied conceptual levels".

razorbeamz 82 days ago

The important point I'm trying to reinforce is that LLMs are not capable of calculation. They can give an answer based on the fact that they have seen lots of calculations and their results, but they cannot actually perform mathematical functions.

XenophileJKO 82 days ago

That is a pretty bold assertion for a meatball of chemical and electrical potentials to make.

razorbeamz 82 days ago

Do you know what "LLM" stands for? They are large language models, built on predicting language.

They are not capable of mathematics because mathematics and language are fundamentally separated from each other.

They can give you an answer that looks like a calculation, but they cannot perform a calculation. The most convincing of LLMs have even been programmed to recognize that they have been asked to perform a calculation and hand the task off to a calculator, and then receive the calculator's output as a prompt even.

But it is fundamentally impossible for an LLM to perform a calculation entirely on its own, the same way it is fundamentally impossible for an image recognition AI to suddenly write an essay or a calculator to generate a photo of a giraffe in space.

People like to think of "AI" as one thing but it's several things.

timschmidt 82 days ago

Obligatory Everything is a Remix: https://www.youtube.com/watch?v=nJPERZDfyWc

tim333 82 days ago

Move 37.

Jarwain 82 days ago

I mean it's not going to invent new words no, but it can figure out new sentences or paragraphs, even ones it hasn't seen before, if it's highly likely based on its training and context. Those new sentences and paragraphs may describe new ideas, though!

sneak 82 days ago

LLMs are absolutely capable of inventing new words, just as they are capable of writing code that they have never seen in their training data.

keeda 82 days ago

I'm curious as to why you consider this as the benchmark for AI capabilities. Extremely few humans can solve hard problems or do much innovation. The vast majority of knowledge work requires neither of these, and AI has been excelling at that kind of work for a while now.

If your definition of AI requires these things, I think -- despite the extreme fuzziness of all these terms -- that it's closer to what most people consider AGI, or maybe even ASI.

Validark 82 days ago

Fair point, however I am simply more interested in how AI can advance frontiers than in how it can transcribe a meeting and give a summary or even print out React code. I know the world is heavily in need of the menial labor and AI already has made that stuff way easier and cheaper.

However I'm just very interested in innovation and pushing the boundaries as a more powerful force for change. One project I've been super interested in for a while is the Mill CPU architecture. While they haven't (yet) made a real chip to buy, the ideas they have are just super awesome and innovative in a lot of areas involving instruction density & decoding, pipelining, and trying to make CPU cores take 10% of the power. I hope the Mill project comes to fruition, and I hope other people build on it, and I hope that at some point AI could be a tool that prints out innovative ideas that took the Mill folks years to come up with.

hnfong 81 days ago

It's kind of interesting in your original comment you used the words "doubter" and "believer", as if AI was some kind of messianic event of some sort and you are deciding whether to "believe" in it.

I mean, if you step back and think about it, there's nothing that requires faith. As you said, current AI can do a lot of things pretty well (transcribe and summarize meetings, write boilerplate code, etc.) Nobody is doubting this.

And AI is definitely helping in innovation to some extent. Not necessarily drive it singlehandedly, but some people working on world-changing innovation find AI useful.

So yeah, I think some people are subconsciously not doubting whether AI works, but kinda having conflicted thoughts about AI being our new overlords or something.

If you think about it, is having AI that's capable of innovating better than humans really a good thing? Like, even if we manage to make benign AI who won't copy how humans are jerks to each other, it kinda takes away our fun of discovery.

Validark 74 days ago

"it kinda takes away our fun of discovery"

It might, but that would be an incredibly awesome problem to have, wouldn't it? If we really had the infinite innovation printer, I'd hope we'd have a lot more fun at that point.

By "believer" versus "doubter" I mainly meant I see it as more than a just a next-word-predictor. But the religious language is probably appropriate nonetheless.

doctorpangloss 82 days ago

most issues at every scale of community and time are political, how do you imagine AI will make that better, not worse?

there's no math answer to whether a piece of land in your neighborhood should be apartments, a parking lot or a homeless shelter; whether home prices should go up or down; how much to pay for a new life saving treatment for a child; how much your country should compel fossil fuel emissions even when another country does not... okay, AI isn't going to change anything here, and i've just touched on a bunch of things that can and will affect you personally.

math isn't the right answer to everything, not even most questions. every time someone categorizes "problems" as "hard" and "easy" and talks about "problem solving," they are being co-opted into political apathy. it's cringe for a reason.

there are hardly any mathematicians who get elected, and it's not because voters are stupid! but math is a great way to make money in America, which is why we are talking about it and not because it solves problems.

if you are seeking a simple reason why so many of the "believers" seem to lack integrity, it is because the idea that math is the best solution to everything is an intellectually bankrupt, kind of stupid idea.

if you believe that math is the most dangerous thing because it is the best way to solve problems, you are liable to say something really stupid like this:

> Imagine, say, [a country of] 50 million people, all of whom are much more capable than any Nobel Prize winner, statesman, or technologist... this is a dangerous situation... Humanity needs to wake up

https://www.darioamodei.com/essay/the-adolescence-of-technol...

Dario Amodei has never won an election. What does he know about countries? (nothing). do you want him running anything? (no). or waking up humanity? In contrast, Barack Obama, who has won elections, thinks education is the best path to less violence and more prosperity.

What are you a believer in? ChatGPT has disrupted exactly ONE business: Chegg, because its main use case is cheating on homework. AI, today, only threatens one thing: education. Doesn't bode well for us.

Validark 82 days ago

I agree with what you're saying, and I certainly don't think the one problem facing my country or the world is just that we didn't solve the right math problem yet. I am saddened by the direction the world keeps moving.

When I wrote that I hope we use it for good things, I was just putting a hopeful thought out there, not necessarily trying to make realistic predictions. It's more than likely people will do bad things with AI. But it's actually not set in stone yet, it's not guaranteed that it has to go one way. I'm hopeful it works out.

mo7061 82 days ago

It 100% will not be used to make the world better and we all know it will be weaponised first to kill humans like all preceding tech

tim333 82 days ago

Most tech gets used for good and bad.

catlifeonmars 82 days ago

Are the only two options AI doubter and AI believer?

Validark 82 days ago

Perhaps I should have elaborated more but what I mean is I used to think, "I genuinely don't see the point in even trying to use AI for things I'm trying to solve". Ironically though, I think that because I've repeatedly tried and tested AI and it falls flat on its face over and over. However, this article makes me more hopeful that AI actually could be getting smarter.

sph 82 days ago

All I hear about are AI believers and AI-doubters-just-turned-believers

Validark 82 days ago

Hey, I'm a real person. Here's my website. I have YouTube videos up with my real name and face. https://validark.dev

qsera 82 days ago

Asking the right questions...

torginus 82 days ago

I remember there was a conversation between two super-duper VCs (dont remember who but famous ones), about how DeepSeek was a super-genius level model because it solved an intro-level (like week 1-2) electrodynamics problem stated in a very convoluted way.

While cool and impressive for an LLM, I think they oversold the feat by quite a bit.

I don't want to belittle the performance of this model, but I would like for someone with domain expertise (and no dog in the AI race, like a random math PhD) to come forward, and explain exactly what the problem exactly was, and how did the model contribute to the solution.

jacquesm 82 days ago

> I really hope we use this intelligence resource to make the world better.

I wished I had your optimism. I'm not an AI doubter (I can see it works all by myself so I don't think I need such verification). But I do doubt humanity's ability to use these tools for good. The potential for power and wealth concentration is off the scale compared to most of our other inventions so far.

keybored 82 days ago

> I would like to see a few more AI inventions to know for sure, but wow, it really is a new and exciting world.

We already have a few years of experience with this.

> I really hope we use this intelligence resource to make the world better.

We already have a few years of experience with this.

Validark 74 days ago

What has AI discovered more than a year ago?

keybored 74 days ago

We, people, have discovered.

bigstrat2003 82 days ago

The problem is that the AI industry has been caught lying about their accomplishments and cheating on tests so much that I can't actually trust them when they say they achieved a result. They have burned all credibility in their pursuit of hype.

parasubvert 82 days ago

I'm all for skeptical inquiry, but "burning all credibility" is an overreaction. We are definitely seeing very unexpected levels of performance in frontier models.

otabdeveloper4 82 days ago

> born-again AI believer

sigh

Validark 82 days ago

I honestly do think I'm being honest with myself. I have held it in my mind that I'm not impressed until it's innovative. That threshold seems to be getting crossed.

I'm not saying, "I used to be an atheist, but then I realized that doesn't explain anything! So glad I'm not as dumb now!"

otabdeveloper4 82 days ago

Somehow people don't need "faith" and "being impressed" to make a hammer or a car work.

(This shows that LLMs aren't tools yet.)

himata4113 82 days ago

It's less of solving a problem, but trying every single solution until one works. Exhaustive search pretty much.

It's pretty much how all the hard problems are solved by AI from my experience.

famouswaffles 82 days ago

If LLMs really solved hard problems by 'trying every single solution until one works', we'd be sitting here waiting until kingdom come for there to be any significant result at all. Instead this is just one of a few that has cropped up in recent months and likely the foretell of many to come.

raincole 82 days ago

In other words, it's solving a problem.

slg 82 days ago

Yes, but is it "intelligence" is a valid question. We have known for a long time that computers are a lot faster than humans. Get a dumb person who works fast enough and eventually they'll spit out enough good work to surpass a smart person of average speed.

It remains to be seen whether this is genuinely intelligence or an infinite monkeys at infinite typewriters situation. And I'm not sure why this specific example is worthy enough to sway people in one direction or another.

parasubvert 82 days ago

Someone actually mathed out infinite monkeys at infinite typewriters, and it turns out, it is a great example of how misleading probabilities are when dealing with infinity:

"Even if every proton in the observable universe (which is estimated at roughly 1080) were a monkey with a typewriter, typing from the Big Bang until the end of the universe (when protons might no longer exist), they would still need a far greater amount of time – more than three hundred and sixty thousand orders of magnitude longer – to have even a 1 in 10500 chance of success. To put it another way, for a one in a trillion chance of success, there would need to be 10^360,641 observable universes made of protonic monkeys."

Often infinite things that are probability 1 in theory, are in practice, safe to assume to be 0.

So no. LLMs are not brute force dummies. We are seeing increasingly emergent behavior in frontier models.

staticassertion 82 days ago

> So no. LLMs are not brute force dummies. We are seeing increasingly emergent behavior in frontier models.

Woah! That was a leap. "We are seeing ... emergent behaviors" does not follow from "it's not brute force".

It is unsurprising that an LLM performs better than random! That's the whole point. It does not imply emergence.

parasubvert 82 days ago

> It is unsurprising that an LLM performs better than random! That's the whole point. It does not imply emergence.

By definition, it is emergent behavior when it exhibits the ability to synthesize solutions to problems that it wasn't trained on. I.e. it can handle generalization.

qsera 82 days ago

> We are seeing increasingly emergent behavior in frontier models.

What? Did you see one crying?

rmast 82 days ago

Maybe infinite monkeys at infinite typewriters hitting the statistically most likely next key based on their training.

virgildotcodes 82 days ago

The real question is how to define intelligence in a way that isn't artificially constrained to eliminate all possibilities except our own.

kranner 82 days ago

Bet you didn't come up with that comment by first discarding a bunch of unsuitable comments.

raincole 82 days ago

I hired an artist for an oil painting.

The artist drew 10 pencil sketches and said "hmm I think this one works the best" and finished the painting based on it.

I said he didn't one shot it and therefore he has no ability to paint, and refused to pay him.

virgildotcodes 82 days ago

You learned what was unsuitable over your entire life until now by making countless mistakes in human interaction.

A basic AI chat response also doesn't first discard all other possible responses.

bfivyvysj 82 days ago

How often do you self edit before submitting?

ivalm 82 days ago

because commenting is easy and solving hard problems is hard

qsera 82 days ago

A random sentence can also generate correct solution to a problem once in a long while...does not mean that it "solved" anything..

jasonfarnon 82 days ago

The link has an entire section on "The infeasibility of finding it by brute force."

adventured 82 days ago

No, that's precisely solving a problem.

Shotgunning it is an entirely valid approach to solving something. If AI proves to be particularly great at that approach, given the improvement runway that still remains, that's fantastic.

konart 82 days ago

But this is exactly how we do math.

We start writing all those formulas etc and if at some point we realise we went th wrong way we start from the begignning (or some point we are sure about).

kelseyfrog 82 days ago

How do you think mathematicians solve problems?

lsc4719 82 days ago

That's also the only way how humans solve hard problems.

himata4113 82 days ago

Not always, humans are a lot better at poofing a solution into existence without even trying or testing. It's why we have the scientific method: we come up with a process and verify it, but more often than not we already know that it will work.

Compared to AI, it thinks of every possible scientific method and tries them all. Not saying that humans never do this as well, but it's mostly reserved for when we just throw mud at a wall and see what sticks.

virgildotcodes 82 days ago

More often than not, far, far, far more often than not, we do not already know that it will work. For all human endeavors, from the beginning of time.

If we get to any sort of confidence it will work it is based on building a history of it, or things related to "it" working consistently over time, out of innumerable other efforts where other "it"s did not work.

coderenegade 82 days ago

That's just not true at all. There are entire fields that rest pretty heavily on brute force search. Entire theses in biomedical and materials science have been written to the effect of "I ran these tests on this compound, and these are the results", without necessarily any underlying theory more than a hope that it'll yield something useful.

As for advances where there is a hypothesis, it rests on the shoulders of those who've come before. You know from observations that putting carbon in iron makes it stronger, and then someone else comes along with a theory of atoms and molecules. You might apply that to figuring out why steel is stronger than iron, and your student takes that and invents a new superalloy with improvements to your model. Remixing is a fundamental part of innovation, because it often teaches you something new. We aren't just alchemying things out of nothing.

himata4113 81 days ago

Well, we know that mixing lead into copper won't make for a strong material. There's a lot of human ingenuity involved.

I failed to make my point clear: Humans make the search area way smaller compared to current day AI.

nextaccountic 82 days ago

AI can one shot problems too, if they have the necessary tools in their training data, or have the right thing in context, or have access to tools to search relevant data. Not all AI solutions are iterative, trial and error.

Also

> humans are a lot better at (...)

That's maybe true in 2026, but it's hard to make statements about "AI" in a field that is advancing so quickly. For most of 2025 for example, AI doing math like this wouldn't even be possible

jMyles 82 days ago

There have been both inductive and deductive solutions to open math problems by humans in the past decade, including to fairly high-profile problems.