| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by spmurrayzzz 770 days ago
	> having to install Ollama + Cuda to get locally working LLM didn't felt right to me when there's all what's needed in the browser Was there something specifically about the install that didn't feel right? I ask because ollama is just a thin go wrapper around llama.cpp (its actually starting a modified version of the llama.cpp server in the background, not even going through the go ffi, likely for perf reasons). In that that sense, you could just install the CUDA toolkit via your package manager and calling `make LLAMA_CUDA=1; ./server` from the llama.cpp repo root to get effectively the same thing in two simple steps with no extra overhead.

1 comments

littlestymaar 769 days ago

I'm never gonna have my non-tech friend do any of this when they can just go to chat.openai.com and call it a day.

Most people value convenience at the expense of almost everything else when it comes to technology.

link

spmurrayzzz 769 days ago

> I'm never gonna have my non-tech friend do any of this

Who was making that assertion? I certainly wasn't.

In the same way I am never going to tell my non-engineer friends to build their own todo app instead of just using something like Todoist. But if they told me they cared about data privacy/security, I'd walk them through the steps if they cared to hear them.

link

littlestymaar 769 days ago

> Who was making that assertion? I certainly wasn't.

But you were responding to my comment, and that was the implied part in it (which I later clarified to answer your question).

> In the same way I am never going to tell my non-engineer friends to build their own todo app instead of just using something like Todoist. But if they told me they cared about data privacy/security, I'd walk them through the steps if they cared to hear them.

Fortunately for most apps there's a middle ground between “use a spyware” and “build your own”, and that's exactly why this tool is much needed for LLM in my opinion.

link

spmurrayzzz 768 days ago

> Fortunately for most apps there's a middle ground between “use a spyware” and “build your own”, and that's exactly why this tool is much needed for LLM in my opinion.

Sure I understand the motivation I think, the big tradeoff is performance. If your original commentary about people privileging convenience holds true across the end-to-end user experience here, I would say that single digit tokens per second rates probably qualify as inconvenient for many folks and thus cannibalize whatever ease-of-setup value you get at the outset.

There's a reason CUDA/ROCm is needed for the acceleration, there's a ton of work put into optimization via custom kernels to get the palatable throughput/latency consumers are used to when using frontier model APIs (or GPU-accelerated local stacks).

link