| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by brucethemoose2 846 days ago
	exl2 is Nvidia/AMD only. But GGUF Mixtral should fit in 32GB... just not with the full 32K context. Long context is very memory intense in llama.cpp, at least until they fully implement flash attention and a quantized cache.