Hacker News new | ask | show | jobs
by tjoff 600 days ago
The biggest time sink for me is validating answers so not sure I agree on that take.

Fast iteration is a killer feature, for sure, but at this time I'd rather focus on quality for it to be worthwhile the effort.

3 comments

If you're using an LLM as a compressed version of a search index, you'll be constantly fighting hallucinations. Respectfully, you're not thinking big-picture enough.

There are LLMs today that are amazing at coding, and when you allow it to iterate (eg. respond to compiler errors), the quality is pretty impressive. If you can run an LLM 3x faster, you can enable a much bigger feedback loop in the same period of time.

There are efforts to enable LLMs to "think" by using Chain-of-thought, where the LLM writes out reasoning in a "proof" style list of steps. Sometimes, like with a person, they'd reach a dead-end logic wise. If you can run 3x faster, you can start to run the "thought chain" as more of a "tree" where the logic is critiqued and adapted, and where many different solutions can be tried. This can all happen in parallel (well, each sub-branch).

Then there are "agent" use cases, where an LLM has to take actions on its own in response to real-world situations. Speed really impacts user-perception of quality.

> There are LLMs today that are amazing at coding, and when you allow it to iterate (eg. respond to compiler errors), the quality is pretty impressive. If you can run an LLM 3x faster, you can enable a much bigger feedback loop in the same period of time.

Well now the compiler is the bottleneck isn't it? And you would still need human check for bugs that aren't caught by the compiler.

Still nice to have inference speed improvements tho.

Something will always be the bottleneck, and it probably won’t be the speed of electrons for a while ;)

Some compilers (go) are faster than others (javac) and some languages are interpreted and can only be checked through tests. Moving the bottleneck from AI code gen step to the same bottleneck as a person seems like a win.

Spelling out the code in editor is not really the bottleneck.
And yet it takes a non-zero amount of time. I think an apt comparison is a language like C++ vs Python. Yea, technically you can write the same logic in both, but you can't genuinely say that "spelling out the code" takes the same amount of time in each. It becomes a meaningful difference across weeks of work.

With LLM-pair-programing, you can basically say "add a button to this widget that calls this callback" or "call this API with the result of this operation", and the LLM will spit out code that does that thing. If your change is entirely within 1-2 files, and < 300 LOC, in a few seconds, and it can be in your IDE, probably syntactically correct.

It's human-driven, and the LLM just handles the writing. The LLM isn't doing large refactors, nor is it designing scalable systems on its own. A human is doing that still. But it does speed up the process noticeably.

If the speed is used to get better quality with no more input from the user then sure, that is great. But that is not the only way to get better quality (though I agree that there are some low hanging fruit in the area).
To be honest most LLM's are reasonable at coding, they're not great. Sure they can code small stuff. But the can't refactor large software projects, or upgrade them.
Upgrading large java projects is exactly what AWS want you to believe their tooling can do, but the ergonomics aren't great.

I think most of the capability problems with coding agents aren't the AI itself, it's that we haven't cracked how to let them interact with the codebase effectively yet. When I refactor something, I'm not doing it all at once, it's a step by step process. None of the individual steps are that complicated. Translating that over to an agent feels like we just haven't got the right harness yet.

Honestly, most software tasks aren’t refactoring large projects, so it’s probably OK.

As the world gets more internet connected and more online, we’ll have an ever expanding list of “small stuff” - glue code that mixes and ever growing list of data sources/sinks and visualizations together. Many of which are “write once” and leave running.

Big companies (eg google) have built complex build systems (eg bazel ) to isolate small reusable libraries within in a larger repo. Which was a necessity to help unbelievably large development teams to manage a shared repository. An LLM acting in its small corner of the wold seems well suited to this sort of tooling, even if it can’t refactor large projects spanning large changes.

I suspect we’ll develop even more abstractions and layers to isolate LLMs and their knowledge of the wold. We already have containers and orchestration enabling “serverless” applications, and embedded webviews for GUIs.

Think about ChatGPT and their python interpreter or Claude and their web view. They all come with nice harnesses to support a boilerplate-free playground for short bits of code. That may continue to accelerate and grow in power.

What's your favorite orchestration solution for this kind of lightweight task?
> The biggest time sink for me is validating answers so not sure I agree on that take.

But you're assuming that it'll always ne validated by humans. I'd imagine that most validation (and subsequent processing, especially going forward) will be done on machines.

If that is the way to get quality, sure.

Otherwise I feel that power consumption is the bigger issue than speed, though in this case they are interlinked.

Humans consume a lot of power and resources.
The basic efficiency is pretty high.
How does the next machine/LLM know what’s valid or not? I don’t really understand the idea behind layers of hallucinating LLMs.
By comparison with reality. The initial LLMs had "reality" be "a training set of text", when ChatGPT came out everyone rapidly expanded into RLFH (reinforcement learning from human feedback), and now there's vision and text models the training and feedback is grounded on a much broader aspect of reality than just text.
Given that there are more and more AI generated texts and pictures that ground will be pretty unreliable.
Perhaps. But CCTV cameras and smartphones are huge sources of raw content of the real world.

Unless you want to take the argument of Morpheus in The Marix and ask "what is real?"

So let’s crank up total surveillance for better auto descriptions of a picture.

We aren’t exchanging freedom for security anymore, what could be reasonable under certain conditions, we just get convenience. Bad deal.

Could you link to a paper or working POC that shows how this “turtles all the way down“ solution works?
I don't understand your question.

This isn't turtles all the way down, it's grounded in real world data, and increasingly large varieties of it.

How does the AI know it’s reality and not a fake image or text fed to the system?
And who validates the validation?
the compiler/interpreter are assumed to work in this scenario.
Exactly, validating and rewriting the prompt are the real time consuming tasks.