| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by brucethemoose2 1037 days ago
	Apache TVM, with (for instance) mlc-llm. It will compile to CPU, Vulkan, and other esoteric backends, and its autotuning is like black magic. Llama.cpp is still SOTA on CPU, as far as I know, especially with a small discrete GPU to help with long prompt ingestion. And it has tons of features (like grammar, context extending and good quant) that other frameworks are still missing.