Hacker News new | ask | show | jobs
by mikert89 39 days ago
I've come to the conclusion that if AI can do it, its not hard. None of the complicated software i work on can be reliably written by ai yet
4 comments

SotA models have cracked a handful of research-level math problems though.

The default Claude Code style harness is bad for complicated problems as well. Just taking the specific class or function you're working on, and putting it into a deep research style loop yields way better results. Limiting the initial context by hand is still the way to go in a lot of cases.

No, it's the same for math from what I've seen, aka it can do some of the easy things, usually with a lot of help. People usually mean the Erdos problems (aka "a list of things Paul Erdos thought were neat") and, well, here:

> While Erdős generated a huge number of problems, they are not all equally significant and important. I have, unfortunately, seen some mathematicians grow dismissive of Erdős problems recently, perhaps because they have seen reports of AI solving problems on this site that turned out to be quite simple, and wrongly generalised this to assume that all problems posed by Erdős are amusing novelties, of the level of olympiad problems.

From: https://www.erdosproblems.com/forum/thread/blog:5

The rest of the article isn't about AI at all, but I did think it was funny that it describes mathematicians as having more or the same opinion as SWEs.

You're right, definitely as a helper, not a one shot thing.

Here's an example unrelated to the Erdos problems: https://arxiv.org/abs/2510.23513

I flat out don't take arxiv papers seriously without a lot of corroboration. Basically ~anyone can post ~anything to it. They do spam moderation but no content review.

Also that paper admits the problem turned out to be pretty trivial and was only unsolved because nobody had bothered to try that hard (page 11)

There's a lot of problems with paid scientific journals being a walled garden and I am by no means defending that system, buuuut it's also true that anything published to an open repository is almost certainly there because it wasn't good enough for anything else.

That probably took a lot more tokens and iteration than normally invested in generating code.
Yeah this is the same conclusion I have. I primarily use AI for UI code, and guess what, it's all basically mechanical drudgery anyways. Put a div here, or put a Box here, apply some style rules, etc. This shit should have been automated decades ago yet for some reason we're still writing the same stuff with a different "twist" today.

Now if your career is built on writing out the same boilerplate code in its infinite slight variations every day, congrats, you've been automated. Thank god we can free up our intellects to focus on the actual hard problems, the ones that are somewhat cutting edge, the ones that actually push our field and humanity forward.

Literally every example of AI generated code (without significant human input) is just basic stuff that is wholly unimpressive. Oh wow, you had an AI generate a Next.js app? It's writing HTML for you? It made a generic SAAS? Guess I'll become a farmer now.

Or, wait, I'll continue to write my multithreaded real-time multiplayer network for a MMO, since the AI currently generates something that would get me fired 10 seconds ago if I tried to push it to production.

It's amazing how you introduce just the slightest difficulty or novelty to an AI and it just craps the bed. And then you go online and apparently we're gonna be replaced -6 months ago or something.

People need a reality check.

I genuinely appreciated this comment—it made me chuckle. That said, I think there are better approaches to working with AI besides “here’s a big vague thing to work on, go write some code”. I think you have to iterate somewhat closely with the AI to write a doc describing exactly what you want the system to do and then scope out very narrow tickets and then have a separate agent do the TDD to actually produce the thing. The key insights here are (1) don’t let a code writing agent have too much scope—just a narrowly scoped ticket, (2) keep the coding agent’s context minimal, (3) don’t let the coding agent write much code without testing it. The agent should make very small changes at a time and then test that everything still works.

You will still need to QA stuff and review PRs, but I think AI done properly can genuinely make some tasks better.

> don’t let a code writing agent have too much scope—just a narrowly scoped ticket

it's interesting cuz my intuition is to give the language model writing the files as much context as possible, which means all of the previous planning thread. but I also thought you should plan with a small model and implement with a large one, and the meta seems to be plan with an expensive one and delegate code output to smaller ones. so what do I know.

> The agent should make very small changes at a time and then test that everything still works.

yeah I think if it's treated like a codegen machine it's basically just outputting code as if you're using a dsl, except the dsl is natural language and the output is meant to be edited, no `// this is generated code, do not edit` headers

> I think AI done properly can genuinely make some tasks better

thank god I dont need to write html by hand anymore, what a pita

Models seem to perform worse if you give them too much context. Even if you have a large context window, it seems like they’re only “smart” in the first few tens of thousands of tokens (including the system prompt, which is often huge). Also, it seems like they’re do better if you start a fresh agent off with a very narrow task and give them access to more context as necessary rather than shoving everything you have into their context window and wishing them well.

But I should also emphasize my limited experience and the rapid pace that this stuff is evolving.

I had it throwing in free advice on my code working as intended, but not a normal pattern. It was something like:"Bonus! This bug exists!" And I had to tell it stop doing that. Or, for generated SQL renaming to keep deeply linked table columns human readable via comments it was - "You can't have a comment of this style here." It works perfectly so yes, yes I can.

I can certainly get it to do things that are reasonably common it seems like.

As for the article itself, I can agree with much of it.

I had AI fuck up writing a scraper[0]. A scraper. It hit a snag with cookies and spiraled into a tizzy. I liked the part where it assured me it could resume from the point of failure, while starting over for the 10th time because it had written no such code lol

[0] For those with AI scraping PTSD, it was a government site with public domain info and I know how to scrape politely

I mean that’s been my line every time someone makes impressed noises when I say I’m a programmer - it’s really not that hard, it’s really just a question of whether you like it enough to put the work in, like anything else. “Don’t you have to be a math wiz?” No dude 95% of the time whatever you’re trying to do already has a very well researched approach, a lot of times you’re just picking which pre-vetted solution to adapt to your needs.
no i mean the opposite, some programming is actually hard
Right. Like anywhere the conceptual problems haven't been all figured out yet, or where higher order effects happen with scale or particular shapes of data/substrate and you don't know them in advance.

Sometimes hard like interesting and you get to do really novel thinking. A load of p2p/decentralised things are hard like this.

Also sometimes hard like you get to a particular challenge and it turns out to be a notoriously unsolved mathematical thing, or you push against subtle boundaries of core libraries, runtimes, systems etc. Working with metagenome assemblies is this kind of hard.

Honestly the hard code I've done made such a difference to my brain. There's plenty of trivial stuff I'm happy to have automated, but of I can't work on the hard problems I may as well not be involved at all.

What type of software are you talking about?