Hacker News new | ask | show | jobs
by Nition 23 days ago
With the level of ability that AI is at right now, I've found it useful personally to think of it something like a very good search over existing knowledge. Another step up in searchability in the lineage of reference books, stack overflow, GitHub etc.

Programmers are rewriting and reinventing the same techniques more often than any other vocation I can think of, and so we were primed for a really good search over prior art. The fact that AI can also adapt that prior art to your particular use case makes it even more powerful.

Much like how great success never came from cobbling together various bits of copy-pasted code from Stack Overflow though, current AI can't really build your whole project.

3 comments

> Programmers are rewriting and reinventing the same techniques more often than any other vocation I can think of

And the answer to that is clearly a tool that makes rewriting/reinventing cheaper than actually packaging nice reusable libraries

Nice reusable libraries are still a core part of most AI projects, but honestly I think it's not a terrible approach with all the updating dependency malware issues with stuff like NPM.
> I think it's not a terrible approach with all the updating dependency malware issues with stuff like NPM

I think in this instance, the only thing worse than a zero day in your dependency tree, is a zero day you don't know your LLM vendored directly into your codebase...

Personally I feel a vulnerability in local code (unshared ai slop) is much less likely to be exploited, than for say a npm package update that will pwn you as soon as it loads up.
And on other hand I really do not understand how basic project boilerplate templating wasn't already a fully solved issue. Surely it should have been doable...
I guess the nice thing about AI is it points to the things that we really need to figure out how to abstract instead of rewriting from scratch all the time.
I'm all for packaging nice reusable libraries, but someone has to actually do it. A lot of them just don't exist (yet).
I sort of hold the opposite view: pretty much anything I want to do, there are about ten competing libraries/frameworks/languages, and the lack of commonality across them means I'm often wedged into weird ecosystem choices
Yes, I don't have anything important to say other than I 100% agree with this comment. AI in its current state is akin to Stack Overflow and Google on steroids, but from my experience, it doesn't do well building out full-scale applications other than perhaps some initial scaffolding.

If I were to use it against a legacy, rather poorly written codebase, where the code may be hard to understand without some in-depth analysis. I could certainly ask an AI agent to read the code (How does application X do Y, for example), but I wouldn't have it start hammering out features or have it do any type of refactoring. That would cause far too many commits and confusion amongst the development team, leading to even more slop than whatever we'd already be dealing with.

Just leaving this comment here so I can come back to your comment. I've been getting a bit discouraged by AI lately, but this sums up my experience with it well enough.

> Yes, I don't have anything important to say other than I 100% agree with this comment. AI in its current state is akin to Stack Overflow and Google on steroids, but from my experience, it doesn't do well building out full-scale applications other than perhaps some initial scaffolding.

We're currently using it to build out a full-scale application. It does as well as you care to coax into doing tbh. You have to invest heavily in harness engineering, and at least my experience has been that as you do that, the results improve.

>It does as well as you care to coax into doing tbh. You have to invest heavily in harness engineering, and at least my experience has been that as you do that, the results improve.

That is also my experience.

When starting a project I observe how the agent fails, I add new rules to the harness to prevent it from falling and repeat the process until I am happy with the output.

I'm unfamiliar with harness engineering. Is there any good documentation about the subject you could point me to?
https://openai.com/index/harness-engineering/

https://www.anthropic.com/engineering/harness-design-long-ru...

https://www.anthropic.com/engineering/effective-harnesses-fo...

These were some of the first major articles on it. It's becoming a popular topic, so there's more content on it all the time.

I can't point you to a good complete documentation, because the field is changing very fast as people make new discoveries.

I learned by reading articles, success stories failure stories and mostly by doing, trying stuff, see how it works and adjusting it and burning a lot of tokens along the way.

What I would do in your shoes, I would ask an AI chat to find new articles on the matter (including on HN), explain how Codex, Claude, Pi are managing agents.

My compressed view is: you need to have a great specification both business and architecture wise that doesn't leave anything important for the model to guess because chances are it will make the wrong choices. That comprehensive spec should not be in one huge chunk. Have your plan divided in phases that each fit in a context window and have the spec for each phase. Use TDD, strive for 100% coverage. Force the model to behave: if it doesn't do what is supposed to, give it feedback and ask it to retry and don't allow it to progress to the next stage unless everything is perfect. I also like to write comprehensive integration tests before building anything. The agents are not allowed to touch or read the integration tests, only run them and they will get feedback where the tests fail. I like to build the integration tests in a different language than the software I am building, to make sure there isn't something platform specific that the tests rely on. I use C#, Go, Rust and Zig for development and Python for the integration tests.

For now, to get good results, I can't just copy and paste the setup from a project to another, I have to work a lot to tailor the process for each new codebase.

And that's why I am working on an agent harness to try to force the agents to do the right things in most common development scenarios without wasting much tokens. By common development scenarios I mean that is a large goal, right now I am working towards backend web development and microservices.

Sounds like bag pipes to me LOL
In my experience, you’ll eventually hit a context window issue and it will just start spouting gibberish/doing wrong things, and nothing will significantly improve it. But hey, maybe it’s improved.
Well, auto-compaction is a thing in Claude Code now. Plus we have /goal command and some automated review stuff, so you can kinda just get it to loop until the automated reviews are satisfied and CI is passing. Does most of the heavy lifting.
"The fact that AI can also adapt that prior art to your particular use case makes it even more powerful."

Well that's what everyone is claiming anyway