| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by carom 1090 days ago
	Just set things up locally last night. If you're a developer, llama.cpp was a pleasure to build and run. I wanted to run the weights from Meta and couldn't figure out text generation web ui. It seemed that one was optimized for grabbing something off HuggingFace. Running on a 3090. The 13b chat model quantized to fp8 is giving about 42 tok/s.