Hacker News new | ask | show | jobs
by _ea1k 819 days ago
ollama run mixtral will default to the quantized version (4bit IIRC). I'd guess this is why it can fit with two 3090s.