| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by crystal_revenge 85 days ago
	I've already started moving my personal projects off github and onto forgejo running on my homelab. I know a lot of people doing the same. With a hermes-agent for a sysadmin I can debug problems from my phone, so I wouldn't be surprised if I have more "9s" that GH. But if it ends up costing extra for GH, especially for work usage, then it's just a simple calculation of "is this worth it?" which I suspect for most cases will be 'yes'.

1 comments

overfeed 85 days ago

> [...]it's just a simple calculation of "is this worth it?" which I suspect for most cases will be 'yes'

Once the landgrab-stage flat-pricing goes away, it will become a case-by-case calculation because unsupervised agents can (and will) run up your billing with zero understanding of the business value of what they're instructed to solve.

link

crystal_revenge 85 days ago

> with zero understanding of the business value

What kind of products/services are you building where you aren't able to tie your eval suite to business value? If you can't, then why are you building whatever is it you are in the first place?

By far one of the biggest changes I think we'll see in things being built by agents is reducing the gap between code and value. The first stage is to start making it possible to measure quality (evals) and the second stage is to more closely align measurable equality with value. The business value of the tokens spent on my team was discussed my first day.

> Once the landgrab-stage flat-pricing goes away

Aside from the above point, I'm already running local LLMs on my homelab that, while not quite what I want for truly production work, have been able to iterate on and solve real, non-trivial research tasks for effectively zero cost (energy cost was roughly on par with running an old light bulb).

The way open, local models have been developing there will be many cases where if proprietary providers over-charge it won't be a deal breaker to just switch to local models. Not to mention that there are plenty of open, but non-local models that are already 5x cheaper and roughly on par with the mainstream model providers.

link

overfeed 85 days ago

> What kind of products/services are you building where you aren't able to tie your eval suite to business value?

There are no evals in my org that can quantify the value of a proposed feature, rank it against ongoing support issues that pop up, or know when to stop expending effort when no solution has been found or too many unknowns crop up. We still rely on natural intelligence for that, and haven't YOLO'd (ha) on Independent agents. I'd rather quit than spend my day herding agents and have my job reduced to just a code-review monkey.

Benchmark evals are at least 3 degrees removed from actual business value - maybe less of your tasks are repetitive. None of the harnesses I've used have a sense of a compute budget - outside of Boolean think/no-thinking modes.

link