Hacker News new | ask | show | jobs
by surgical_fire 246 days ago
"AI will take over the world".

I hear that. Then I try to use AI for simple code task, writing unit tests for a class, very similar to other unit tests. If fails miserably. Forgets to add an annotation and enters in a death loop of bullshit code generation. Generates test classes that tests failed test classes that test failed test classes and so on. Fascinating to watch. I wonder how much CO2 it generated while frying some Nvidia GPU in an overpriced data center.

AI singularity may happen, but the Mother Brain will be a complete moron anyway.

4 comments

Regularly trying to use LLMs to debug coding issues has convinced me that we're _nowhere_ close to the kind of AGI some are imagining is right around the corner.
At least Mother Brain will praise your prompt to generate yet another image in the style of Studio Ghibli as proof that your mind is a tour de force in creativity, and only a borderline genius would ask for such a thing.
Sure, but also the METR study showed the rate of change is t doubles every 7 months where t ~= «duration of human time needed to complete a task, such that SOTA AI can complete same with 50% success»: https://arxiv.org/pdf/2503.14499

I don't know how long that exponential will continue for, and I have my suspicions that it stops before week-long tasks, but that's the trend-line we're on.

Only skimmed the paper, but I'm not sure how to think about "length of task" as a metric here.

The cases I'm thinking about are things that could be solved in a few minutes by someone who knows what the issue is and how to use the tools involved. I spent around two days trying to debug one recent issue. A coworker who was a bit more familiar with the library involved figured it out in an hour or two. But in parallel with that, we also asked the library's author, who immediately identified the issue.

I'm not sure how to fit a problem like that into this "duration of human time needed to complete a task" framework.

This is an excellent example of human “context windows” though and it could be the llm could have solved the easy problem with better context engineering. Despite 1M token windows, things still start to get progressively worse after 100k. LLMs would overnight be amazingly better with a reliable 1M window.
What does "better context engineering" mean here? How/why are the existing token windows "unreliable"?
Fair comment.

While I think they're trying to cover that by getting experts to solve problems, it is definitely the case that humans learn much faster than current ML approaches, so "expert in one specific library" != "expert in writing software".

But will it actually get better or will it just get faster and more power efficient at failing to pair parentheses/braces/brackets/quotes?
Read the linked METR study please.

Or watch the Computerphile video summary/author interview, if you prefer: https://m.youtube.com/watch?v=evSFeqTZdqs

Most reasonable AI alarmists are not concerned with sentient AI but an AI attached to the nukes that gets into one of those repeating death loops and fires all the missiles.
In reality, this isn't a very serious threat. Rather, we're concerned about AI as a tool for strengthening totalitarian regimes.
Given that AI couldn't even speak English 6 years ago, do you really think it's going to struggle with unit tests for the next 20 years?

It's well worth looking at https://progress.openai.com/, here's a snippet:

> human: Are you actually conscious under anesthesia?

> GPT-1 (2018): i did n't . " you 're awake .

> GPT-3 (2021): There is no single answer to this question since anesthesia can be administered [...]

the improvements since 2021 are minor at best. ai thus far has been trained to imitate humans by training it on text written by humans. it's unlikely that you will make something as smart as a human by training it to imitate a human. imitation is a lossy process, you lose knowledge of the "why", you only imitate the outcome. to get beyond this state, we'll need a new technique. so far we've used gradient descent to teach an ai to reproduce a function. to teach it new behaviours will probably take evolutionary approaches. this will take orders of magnitude more compute to get to the same point. so yes it could take 20 years.
> Given that AI couldn't even speak English 6 years ago, do you really think it's going to struggle with unit tests for the next 20 years?

Yes.

LLM is a very interesting technology for machines to understand and generate natural language. It is a difficult problem that it sort of solves.

It does not understand things beyond that. Developing software is not simply a natural language problem.

"Just one more prompt, bro", and your problems will be solved.