|
|
|
|
|
by monkmartinez
876 days ago
|
|
What troubles me is how many projects are using ollama. I can't stand that I have to create a model file for every model using ollama. I have a terabyte of models that are mostly GGUF, which is somewhere around 70 models of various sizes. I rotate in and out of new versions constantly. GGUF is a ~container~ that already has most of the information needed to run the models! I felt like I was taking crazy pills when so many projects started using it for their backend. Text-generation-webui is leagues ahead in terms of plug and play. Just load the model and it will get you within 98% of what you need to run any model from HF. Making adjustments to generation settings, prompt and more is done with a nice GUI that is easily saved for future use. Using llama.cpp is also very easy. It takes seconds to build on my windows computer with cmake. Compiling llama.cpp with different parameters for older/newer/non-existent GPUs is very, very simple... even on windows, even for a guy that codes in Python 97% of the time and doesn't really know a thing about C++. The examples folder in llama.cpp is gold mine of cool things run and they get packaged up into *.exe files for dead simple use. |
|
I'm really, really surprised to hear this:
- I only committed in a big way to local a week ago. TL;DR: Stable LM 3B doing RAG meant my every-platform app needed to integrate local finally.
- Frankly didn't hear of Ollama till I told someone about Nitro a couple weeks back and they celebrated they didn't have to Ollama anymore.
- I can't even imagine what the case for another container would be.
- I'm very appreciative of anyone doing work. No shade on Ollama.
- But I don't understand the seemingly strong uptake to it if it's the case you need to go get special formatted models for it. There's other GUIs, so it can't be because it's a GUI. Maybe it's the blend of GUI + OpenAI API server? Any idea?? There's clearly some product-market fit here* but I'm at as complete a loss as you.
* maybe not? HN has weird voting behavior lately and this got to like #3 with 0 comments last night, then it sorta stays there once it has momentum.
- p.s. hear hear on the examples folder. 4 days, that's it, from 0 to on Mac / iOS / Windows / Android / Linux. I'm shocked how many other Dart projects kinda just threw something together quick for one or two platforms and just...ran with it. At half-speed of what they could have. All you have to do is pattern after the examples to get the speed. Wrestling with Flutter FFI...I understand avoiding lol. Last 4 days were hell. https://github.com/Telosnex/fllama