| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rbalicki 107 days ago

You can lessen your dependence on the specific details of how /loop, code routines, etc. work by asking the LLM to do simpler tasks, and instead, having a proper workflow engine be in charge of the workflow aspects.

For example, this demo (https://github.com/barnum-circus/barnum/tree/master/demos/co...) converts a folder of files from JS to TS. It's something an LLM could (probably) do a decent job of, but 1. not necessarily reliably, and 2. you can write a much more complicated workflow (e.g. retry logic, timeout logic, adding additional checks like "don't use as casts", etc), 3. you can be much more token efficient, and 4. you can be LLM agnostic.

So, IMO, in the presence of tools like that, you shouldn't bother using /loop, code routines, etc.

2 comments

danudey 107 days ago

One thing my team lead is working on is using Claude to 'generate' integration tests/add new tests to e2e runs.

Straight up asking Claude to run the tests, or to generate a test, could result in potential inconsistencies between runs or between tests, between models, and so on, so instead he created a tool which defines a test, inputs and outputs and some details. Now we have a system where we have a directory full of markdown files describing a test suite, parameters, test cases, error cases, etc., and Claude generates the usage of the tool instead.

This means that whatever variation Claude, or any other LLM, might have run-to-run or drift over time, it all still has to be funneled through a strictly defined filter to ensure we're doing the same things the same way over time.

link

latentsea 107 days ago

I'm looking at implementing https://github.com/coleam00/Archon as a means to solve this. You can build arbitrary workflows custom to your codebase. Looks to bring a bit of much-needed determinism.

link

zx8080 107 days ago

What kind of system/area (or product) are you working on?

link

jplusequalt 106 days ago

>You can lessen your dependence on the specific details of how /loop, code routines, etc. work by asking the LLM to do simpler tasks, and instead, having a proper workflow engine be in charge of the workflow aspects.

Or, you know, by writing the code yourself?

link

rbalicki 106 days ago

Yes, exactly! Check out https://github.com/barnum-circus/barnum/blob/master/demos/ba...

link

pc86 106 days ago

"You can lessen your dependence on a specific LLM implementation by not using LLMs" is certainly a take but it doesn't really address the root issue of models getting nerfed to save resources after they've gained wide adoption.

link

rbalicki 106 days ago

A simple task ("convert this file from JS to TS, here are the types of all imported things") is much more likely to continue to work with a nerfed model compared to a complicated task ("convert this repo to TS, make sure to run tsc afterward and fix all errors"). The former is a subtask of the latter!

Taking a moment to create a workflow where these steps are separated (or rather, having an LLM build this workflow) and the LLMs are asked to just do minor leaf tasks increases your resilience to nerfed models.

link