| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by junrushao1994 1085 days ago
	yeah we tried out popular solutions like exllama and llama.cpp among others that support inference of 4bit quantized models