Hacker News new | ask | show | jobs
by hughw 1 day ago
Just this morning I tweaked my single 3090 setup too:

  OLLAMA_FLASH_ATTENTION=1
  OLLAMA_KV_CACHE_TYPE=q8_0
  OLLAMA_CONTEXT_LENGTH=180000
and that fits in 23GB.

[edited for format]

1 comments

Friends don't let friends use Ollama: https://sleepingrobots.com/dreams/stop-using-ollama/