| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by paulmooreparks 26 days ago
	I don't know. I find that I'm moving up a level and improving my product-management skills while delegating most of the code to the agents. I'm still very much hands-on with the design and requirements, and I'm asking questions like, "What's our security story for XYZ?", "Are we accounting for colour-blindness?", etc. Not being down in the code allows me to prairie-dog a bit more and see the landscape better.

3 comments

xantronix 26 days ago

One thing I've noticed is that LLMs have allowed middle managers trapped inside the role of a developer to finally self actualise.

link

keeda 25 days ago

Funnily enough, my LinkedIn feed is full of managers who are ecstatic at being able to "code" again, so it applies to developers trapped in the role of a manager as well!

chef's kiss

I'm about 50% that way. However when the AI is done coding I then step back and review to find places the code quality is unacceptable. I also have to stop the AI once in a while because it forgets the point and does something stupid. Junior engineer learn, AI does not.

link

paulmooreparks 26 days ago

I don't abandon the code to the agent entirely. I have my own... I wouldn't call it a harness as such, but rather a shared Kanban board, and it'll be the subject of a "Show HN" soon. It suffices to say that I define Kanban cards for each feature or bug, and I have clearly defined review points for each card, post-spec and post-code, where I step in. On top of that, after my review, there is an agentic review, and agents can and do catch things that I missed. The quality of the software has improved quite a bit since I instituted that flow.

link

dpoloncsak 26 days ago

> Junior engineer learn, AI does not.

This is technically true, but lets not act like we haven't seen immense improvement of both models are harnesses for these models in the past years. They may not be learning, but they are getting better

link

nyrikki 26 days ago

They are getting better at historical data, not at the fundamental issue.

As a recent example, I recently had to abandon the multiple LLM reviewer/verifier model I was using because zig 0.16 was released with major changes.

I actually reverted back to full self hosted because the foundation models we’re trying too hard to revert to the older versions of the language.

It is going to be a balancing act and there is fundamentally no way for LLMs to get around this.

We will have to develop methods to do so, most likely by focusing agents on problems that are more static.

link

smj-edison 26 days ago

Question for you, since I also use Zig 0.16: how do you get it to use Zig idioms? I use Kimi 2.6, and I feel like whenever I try to get my agent to write modern Zig based on a C reference it decides to start writing everything in a C style (doesn't use defer, doesn't use opaque enums even when I explicitly tell it to, doesn't use Zig's error unions, swallows errors instead of asserting, and some more). It's quite frustrating, and a lot of catchable errors crop up until I've beat modern practices into it.

link

nyrikki 26 days ago

I don’t, it is mostly used for ideas, review, etc…

Getting the agent to grep std, example code, comments that reference inaccessible security or bugs etc.. help a little.

But for my needs, not refactoring would just be stepping over dollars to pick up pennies.

But yes it is a problem.

link

askonomm 26 days ago

I find great success in not relying on LLM's built-in knowledge, but giving it links to necessary docs/manuals and have it read that before doing anything.

link

nyrikki 26 days ago

Currently, with zig 0.16 the agent has to have access to the zig compiler and std library to even produce code that will compile.

If you have zig installed, you can run ‘zig std’ to see that.

You still have the limitations of attention etc…

Even zed’s agent will leverage that built in tarball, but it doesn’t solve the problem, especially as some of the languages killer features are unavailable in C and other languages.

link

embedding-shape 26 days ago

Also, add "no assumptions or guesses" and if you use a model with really strong prompt adherence (most SOTA models), they'll figure out the right version first, then look up docs, then implement.

link

bigstrat2003 25 days ago

Pretend? I don't have to pretend, I haven't seen any real improvement. I wouldn't let the models of today write code one bit more than the models of several years ago, because they still suck at it.

link

dpoloncsak 23 days ago

Models several years ago would struggle to provide code that would compile, and need to be fed whatever errors were thrown to be able to resolve them.

Today's models often output working code. I've had OpenClaw instances one shot simple static web-page HTML, Apache installation, and deployment. It may not meet modern standards or be as secure as you'd like, but fundamentally this is an improvement from previous models.

link

antonios_makro 22 days ago

Agreed, the "one-shot a static site" demo is the new "hello world" for agents. It's a real step up.

link

goran-j 22 days ago

yep, they need something a bit more challenging than printing two words on the screen

link

lanstin 25 days ago

I find that is the case for production code that will be running 24x7 unattended, but also Claude lets me build a lot more highly specific dashboards or visualization tools that I really don’t give a fig what the code is, as long as the numbers sum up and the links work. So my batch job I am careful with, the dashboard I check every morning to see what batches and lambdas failed eh I can wait the two minutes it takes to populate all the data; better to have time to top off coffee than having to understand modern JavaScript, canvas, D3 etc and web frameworks. I do force it to use python and flask for the web serving and SQLite for caching/ memoization, but everything else carte blanche.

link

seunosewa 26 days ago

Unless you log its mistakes and how they were solved in decisions.log

link

gameshot911 25 days ago

I think the right comparison is AI models versions, not intra-AI-model growth (although even that can 'learn' with persistent memory & contexts).

link

slopinthebag 26 days ago

> What's our security story for XYZ?

lmao I hope I never use your products with anything sensitive ever

link

paulmooreparks 25 days ago

I think you missed the point. I don't abandon security to whatever the agent decides to write.

link