| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by surgical_fire 246 days ago

"AI will take over the world".

I hear that. Then I try to use AI for simple code task, writing unit tests for a class, very similar to other unit tests. If fails miserably. Forgets to add an annotation and enters in a death loop of bullshit code generation. Generates test classes that tests failed test classes that test failed test classes and so on. Fascinating to watch. I wonder how much CO2 it generated while frying some Nvidia GPU in an overpriced data center.

AI singularity may happen, but the Mother Brain will be a complete moron anyway.

4 comments

alecbz 246 days ago

Regularly trying to use LLMs to debug coding issues has convinced me that we're _nowhere_ close to the kind of AGI some are imagining is right around the corner.

surgical_fire 246 days ago

At least Mother Brain will praise your prompt to generate yet another image in the style of Studio Ghibli as proof that your mind is a tour de force in creativity, and only a borderline genius would ask for such a thing.

ben_w 246 days ago

Sure, but also the METR study showed the rate of change is t doubles every 7 months where t ~= «duration of human time needed to complete a task, such that SOTA AI can complete same with 50% success»: https://arxiv.org/pdf/2503.14499

I don't know how long that exponential will continue for, and I have my suspicions that it stops before week-long tasks, but that's the trend-line we're on.

alecbz 246 days ago

Only skimmed the paper, but I'm not sure how to think about "length of task" as a metric here.

The cases I'm thinking about are things that could be solved in a few minutes by someone who knows what the issue is and how to use the tools involved. I spent around two days trying to debug one recent issue. A coworker who was a bit more familiar with the library involved figured it out in an hour or two. But in parallel with that, we also asked the library's author, who immediately identified the issue.

I'm not sure how to fit a problem like that into this "duration of human time needed to complete a task" framework.

conception 245 days ago

This is an excellent example of human “context windows” though and it could be the llm could have solved the easy problem with better context engineering. Despite 1M token windows, things still start to get progressively worse after 100k. LLMs would overnight be amazingly better with a reliable 1M window.

alecbz 241 days ago

What does "better context engineering" mean here? How/why are the existing token windows "unreliable"?

ben_w 245 days ago

Fair comment.

While I think they're trying to cover that by getting experts to solve problems, it is definitely the case that humans learn much faster than current ML approaches, so "expert in one specific library" != "expert in writing software".

Pulcinella 246 days ago

But will it actually get better or will it just get faster and more power efficient at failing to pair parentheses/braces/brackets/quotes?

ben_w 246 days ago

Read the linked METR study please.

Or watch the Computerphile video summary/author interview, if you prefer: https://m.youtube.com/watch?v=evSFeqTZdqs

bobsmooth 246 days ago

Most reasonable AI alarmists are not concerned with sentient AI but an AI attached to the nukes that gets into one of those repeating death loops and fires all the missiles.

Ray20 246 days ago

In reality, this isn't a very serious threat. Rather, we're concerned about AI as a tool for strengthening totalitarian regimes.

beyarkay 245 days ago

Given that AI couldn't even speak English 6 years ago, do you really think it's going to struggle with unit tests for the next 20 years?

It's well worth looking at https://progress.openai.com/, here's a snippet:

> human: Are you actually conscious under anesthesia?

> GPT-1 (2018): i did n't . " you 're awake .

> GPT-3 (2021): There is no single answer to this question since anesthesia can be administered [...]

nofriend 245 days ago

the improvements since 2021 are minor at best. ai thus far has been trained to imitate humans by training it on text written by humans. it's unlikely that you will make something as smart as a human by training it to imitate a human. imitation is a lossy process, you lose knowledge of the "why", you only imitate the outcome. to get beyond this state, we'll need a new technique. so far we've used gradient descent to teach an ai to reproduce a function. to teach it new behaviours will probably take evolutionary approaches. this will take orders of magnitude more compute to get to the same point. so yes it could take 20 years.

surgical_fire 244 days ago

> Given that AI couldn't even speak English 6 years ago, do you really think it's going to struggle with unit tests for the next 20 years?

Yes.

LLM is a very interesting technology for machines to understand and generate natural language. It is a difficult problem that it sort of solves.

It does not understand things beyond that. Developing software is not simply a natural language problem.

troupo 246 days ago

"Just one more prompt, bro", and your problems will be solved.