Hacker News new | ask | show | jobs
by nyrikki 205 days ago
I am running a container on an old 7700k with a 1080ti that gives me vscode completions with rag with similar latency and enough accuracy to be useful for boilerplate etc…

That is something I would possibly pay for but as the failures on complex tasks are so expensive, this seems to be a major use case and will just be a commodity.

Creating the scaffolding for a jwt token or other similar tasks will be a race to the bottom IMHO although valuable and tractable.

IMHO they are going to have to find ways to build a mote, and what these tools are really bad at is the problem domains that make your code valuable.

Basically anything that can be vibe coded can be trivially duplicated and the big companies will just kill off the small guys who are required to pay the bills.

Something like surveillance capitalism will need to be found to generate revenue needed for the scale of Microsoft etc…

2 comments

Given how every CPU vendor seems to push for some kind of NPU, local running models will probably be far more common in next 5 years. And convincing everyone to pay subscription for very minimal improvements in functionality gonna be hard.
The NPUs integrated into CPU SoCs are very small compared to even integrated GPUs, much less discrete or datacenter GPUs.

NPUs seem to be targeted towards running tiny ML models at very low power, not running large AI models.

Have you documented your VSCode setup somewhere? I've been looking to implement something like that. Does your setup provide next edit suggestions too?
I keep idly wondering what would be the market for a plug and play LLM runner. Some toaster sized box with the capability to run exclusively offline/local. Plug it into your network, give your primary machine the IP, and away you go.

Of course, the market segment who would be most interested, probably has the expertise and funds to setup something with better horsepower than could be offered in a one size fits all solution.

Ooof, right idea but $4k is definitely more than I would be comfortable paying for a dedicated appliance.

Still, glad to see someone is making the product.

I am working on a larger project about containers and isolation stronger than current conventions but short kata etc…

But if you follow the podman instructions for cuda, the llama.cpp shows you how to use their plugin here

https://github.com/ggml-org/llama.vscode