Hacker News new | ask | show | jobs
by carom 1042 days ago
Just set things up locally last night. If you're a developer, llama.cpp was a pleasure to build and run. I wanted to run the weights from Meta and couldn't figure out text generation web ui. It seemed that one was optimized for grabbing something off HuggingFace.

Running on a 3090. The 13b chat model quantized to fp8 is giving about 42 tok/s.