| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Eisenstein 407 days ago

> if these tools are capable of reasoning and are able to solve advanced logic, math, and programming challenges as shown in benchmarks

The benchmarks are make of questions that humans created and can answer, and are not composed of anything which a human hasn't been able to answer.

> then surely they must be more capable of understanding and improving their own codebases with assistance from humans than humans could do alone.

I don't think that logic follows. The models have proven that they can have more breadth of knowledge than a single human, but not more capability.

Also, they have no particular insight into their own codebases. They only know what is in their training data -- they can use that to form patterns and solve new problems, but they still only have the that and whatever information is given with the question as base knowledge.

> My point is that if this was being done, we should be seeing much greater progress than we've seen so far.

The point is taken, but I think your reasoning is weak.

> Either these tools are intelligent, or they're highly overrated. Which wouldn't mean that they can't be useful, just not to the extent that they're being marketed as.

I may have missed the marketing you have seen, but I don't see the big AI companies claiming that they are anything but tools that can help humans do things or replace certain human tasks. They do not advertise super human capability in intelligence tasks.

I suspect you are seeing a lot of hype and unfounded expectations, and using that as a basis for a calculation. The formula might be right, but the variables are incorrect.

We have a seen a LOT of progress with AI and language models in the last few years, but expecting them to go from 'can understand language and solve complicated novel problems' to 'making better versions of themselves using solutions that humans haven't been able to come up with yet' is a bit much to expect.

I don't know if one would call them intelligent, but something can be intelligent but at the same time not able to make substantial leaps forward in emerging fields.

1 comments

imiric 407 days ago

> The benchmarks are make of questions that humans created and can answer, and are not composed of anything which a human hasn't been able to answer.

Sure, but they do it at superhuman speeds, and if they truly can reason and come up with novel solutions as some AI proponents claim, then they would be able to come up with better answers as well.

So, yes, they do have more capability in certain aspects than a human. If nothing else, they should be able to draw from their vast knowledgebase in ways that a single human never could. So we should expect to see groundbreaking work in all fields of science. Not just in pattern matching applications as we've seen in some cases already, but in tasks that require actual reasoning and intelligence, particularly programming.

> Also, they have no particular insight into their own codebases.

Why not? Aren't most programming languages in their training datasets, and isn't Python, the language most AI tools are written in, one of the easiest languages to generate? Furthermore, can't AI programmers feed its own codebase into the model via context, RAG, etc. in the same way that most other programmers do?

> I may have missed the marketing you have seen, but I don't see the big AI companies claiming that they are anything but tools that can help humans do things or replace certain human tasks. They do not advertise super human capability in intelligence tasks.

You are downplaying the claims being made by AI companies and its proponents.

According to Sam Altman just a few days ago[1]:

> We are past the event horizon; the takeoff has started. Humanity is close to building digital superintelligence

> we have recently built systems that are smarter than people in many ways, and are able to significantly amplify the output of people using them

> We already hear from scientists that they are two or three times more productive than they were before AI.

If a human assisted by AI can be more productive than a human alone, then why isn't this productivity boost producing improvements at a faster rate than what the tech industry has been able to deliver so far? Why aren't AI companies dogfooding their products and delivering actual value to humanity beyond benchmark results and shiny demos?

Again, none of this requires actual superhuman levels of intelligence or reaching the singularity. But just based on what they're telling us their products are capable of, the improvements to their own capabilities should be exponential by now.

[1]: https://blog.samaltman.com/the-gentle-singularity

link

TeMPOraL 407 days ago

FWIW, we're barely a two years into useful LLMs, less than half a year into the AI coding frenzy. Stuff takes time, there's organizational inertia.

Karpathy himself gave a perfect example in the talk with that restaurant menu to pictures app - it took few hours of AI-assisted coding to make it, and a week of devops bullshit to publish it. This is the case for everyone, so it slows down the feedback cycles right now.

Give it a couple of months; if we don't have clear evidence of recursive improvements by this time next year, I'll concede something is really off about it all.

link

imiric 407 days ago

Cool, but I'm not trying to convince you of anything. Believe what you want to believe.

I'm simply pointing out the dissonance between what AI companies have been telling us for the past 2+ years, and the results that we would expect to see if their claims were true. I'm not holding my breath that their promises will magically materialize given more time. If they were honest, they would acknowledge that the current tech simply won't get us there because of fundamental issues that still haven't been addressed (e.g. hallucination). But instead it's more profitable to promote software that "thinks", "reasons", and is "close to digital superintelligence".

That menu app is an example of vibe coding, not of increased productivity. He sidestepped the bulk of the work that a human still needs to do to ensure that the software works beyond the happy path scenario he tested, has no security issues, and so on. Clearly, the reason the DevOpsy tasks took him much longer is because he's not an ops person. The solution to this isn't to offload these tasks to AI as well, and ignore all the issues that this could cause on the operational side. It's to either offload them to a competent ops engineer, or become familiar with the tools and processes yourself so that it doesn't take you a week to do them next time.

If you want to use AI to assist you with mindless mechanical tasks, that's fine, I frequently do so too. But don't tell me it's making you more productive when you ignore fundamental processes of software engineering.

link