Hacker News new | ask | show | jobs
by refulgentis 875 days ago
It's very, very, very annoying how much some people are tripping over themselves to pretend a llama.cpp wrapper is some gift of love from saints to the hoi polloi. Y'all need to chill. It's good work and good. It's not great or the best thing ever or particularly high on either simple user friendliness or power user friendly. It's young. Let it breathe. Let people speak.
2 comments

What troubles me is how many projects are using ollama. I can't stand that I have to create a model file for every model using ollama. I have a terabyte of models that are mostly GGUF, which is somewhere around 70 models of various sizes. I rotate in and out of new versions constantly. GGUF is a ~container~ that already has most of the information needed to run the models! I felt like I was taking crazy pills when so many projects started using it for their backend.

Text-generation-webui is leagues ahead in terms of plug and play. Just load the model and it will get you within 98% of what you need to run any model from HF. Making adjustments to generation settings, prompt and more is done with a nice GUI that is easily saved for future use.

Using llama.cpp is also very easy. It takes seconds to build on my windows computer with cmake. Compiling llama.cpp with different parameters for older/newer/non-existent GPUs is very, very simple... even on windows, even for a guy that codes in Python 97% of the time and doesn't really know a thing about C++. The examples folder in llama.cpp is gold mine of cool things run and they get packaged up into *.exe files for dead simple use.

Thank you for sharing, it's sooooo rare to get signal amongst noise here re: LLMs.

I'm really, really surprised to hear this:

- I only committed in a big way to local a week ago. TL;DR: Stable LM 3B doing RAG meant my every-platform app needed to integrate local finally.

- Frankly didn't hear of Ollama till I told someone about Nitro a couple weeks back and they celebrated they didn't have to Ollama anymore.

- I can't even imagine what the case for another container would be.

- I'm very appreciative of anyone doing work. No shade on Ollama.

- But I don't understand the seemingly strong uptake to it if it's the case you need to go get special formatted models for it. There's other GUIs, so it can't be because it's a GUI. Maybe it's the blend of GUI + OpenAI API server? Any idea?? There's clearly some product-market fit here* but I'm at as complete a loss as you.

* maybe not? HN has weird voting behavior lately and this got to like #3 with 0 comments last night, then it sorta stays there once it has momentum.

- p.s. hear hear on the examples folder. 4 days, that's it, from 0 to on Mac / iOS / Windows / Android / Linux. I'm shocked how many other Dart projects kinda just threw something together quick for one or two platforms and just...ran with it. At half-speed of what they could have. All you have to do is pattern after the examples to get the speed. Wrestling with Flutter FFI...I understand avoiding lol. Last 4 days were hell. https://github.com/Telosnex/fllama

"It's not great or the best thing ever or particularly high on either *simple user friendliness* or power user friendly."

But there are multiple reports in this thread about how easy of an install it was. I'm adding my own in. It was super simple.

It was way easier than installing Automatic1111. It's easier than building llama.cpp.

SnowLprd had some good points for power users although I think he was overly critical in his phrasing. But what's got y'all tripping thinking this is hard?