Hacker News new | ask | show | jobs
by Evan-Almloff 1070 days ago
1. I would love to support additional model runners including exLlama and API based models like chat GPT. I'm less familiar with how c transformers and GPTQ compare to llama.cpp. GPTQ used to run faster because it supported GPU acceleration, but now llama.cpp supports the GPU as well so that may have changed. Feel free to open a GitHub issue to discuss this: https://github.com/floneum/floneum/issues/new/choose

2. There are a few differences: a) Floneum doesn't require any setup. No need to install python, cuda, or pop. Just download the executable and run b) It has first class support for quantized local models c) It supports fully issolated WASM plugins (not arbitrary python code)

3. Floneum is fully Open Source!

2 comments

Thanks for your clarifications. I added it to my awesome list:

https://github.com/underlines/awesome-marketing-datascience/...

Exllama is significantly faster if you can fit the whole model in VRAM.