Hacker News new | ask | show | jobs
by oneshtein 629 days ago
You need ollama[1][2] and hardware to run 20-70B models with quantization of Q4 at least to have similar experience to commercially hosted models. I use codestral:22b, gemma2:27b, gemma2:27b-instruct, aya:35b.

Smaller models are useless for me, because my native language is Ukrainian (it's easier to spot mistakes made by model in a language with more complex grammar rules).

As GUI, I use Page Assist[3] plugin for Firefox, or aichat[4] commandline and WebUI tool.

[1]: https://github.com/ollama/ollama/releases

[2]: https://ollama.com/

[3]: https://github.com/n4ze3m/page-assist

[4]: https://github.com/sigoden/aichat

1 comments

What's the hardware needed to make it run reasonably fast?
I have no idea what "reasonably fast" means for you. It good for performance when model fit inside memory of a graphic card. Nvidia 4090 with 24Gb will be more than enough to start learning. I use gaming notebook with Nvidia 3080Ti equipped with 16Gb of videomemory.
I have no issues with using just the CPU on smaller (<= 13b) models and it's quite fast enough for me. Even 70b models still work if you have the RAM, they're just much slower.