| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by alexpotato 7 days ago

I've posted this before but worth posting again:

I work in DevOps at a firm that has been very enthusiastic about using LLMs (in the good sense).

The phases were basically:

- try out having the LLM do "a lot"

- now even more

- now run multiple agents

- back to single agents but have the agents build tools

- tools that are deterministic AND usable by both the humans (EDIT: and the LLMs)

The reasons:

1. Deterministic tools (for both deployments and testing) get you a binary answer and it's repeatable

2. In the event of an outage, you can always fall back to the tool that a human can run

3. It's faster. A quick script can run in <30 seconds but "confabulating" always seemed to take 2-3 minutes.

Really, we are back to this article: https://spawn-queue.acm.org/doi/10.1145/3194653.3197520 aka "make a list of tasks, write scripts for each task, combine the scripts into functions, functions become a system"

-- END of original post --

What I would add:

if you let LLMs do whatever they want, they will happily make code. You can add tests to confirm that the tests work (which you used to do with human code, right?). You can also read the code.

When you read the code, you'll find that they sometimes do totally bananas things that still produce working code (I've seen humans do this too but that's another story).

In other words, you still need to make sure the system being built makes sense.

More succinctly:

Coding may be dead but software engineering is alive and kicking.

3 comments

esalman 6 days ago

This is pretty much how I've been operating. While the C-suits have been always encouraging everyone- technical and non-technical folks alike- to use AI, the ask from my manager and skip level has always been for deterministic output. Before last December or January I was mainly using LLMs for autocomplete, whereas now it looks more like "given this input write script to generate this output", and after some corrections, "summarize/update this session into a skill". Script for future humans, skill for future agents.

link

theshrike79 6 days ago

This is the way to do it.

You can have the Big Boy LLM do _everything_. It can and it will do it. It will also cost fucktons of money and take a long time.

But if you build tools (with AI) that do as many tasks in the process deterministically as possible and let the AI use those, it'll be a lot faster and cheaper to run it.

As a bonus you can eventually drop the expensive cloud AI and run a small/medium sized local model instead.

link

chasd00 6 days ago

You know, i think harnesses and tools are the next "webapps" for the industry. Everyone is going to be making their own. Some will be great some will be meh but i think that's where things are headed.

link

theshrike79 6 days ago

Yep, there are actual papers written on how you can have a mid-tier model punch way above its weight with an optimised harness and toolset.

Claude is still the best commercially available harness IMO, pi.dev is super good but not something I'd give to non-enthusiasts or would recommend using in an enterprise environment.

I see companies writing their own custom harnesses on top of opencode/pi.dev/crush later down the line. Instead of having a set of skills or MCPs you can just have all of the default stuff built in and automatically updated via normal IT workflows.

link

oblio 6 days ago

LLMs, Jenkins for the entire software universe :-D

For those unaware, Jenkins (Hudson), is a CI server that supports all sorts of pipelines. Those pipelines can very easily be turned into huge balls of mud by putting logic in them. The proper way to do it is to put that logic into simple scripts and tools and have Jenkins just do high level orchestration.

link