Hacker News new | ask | show | jobs
by throwaway4aday 778 days ago
can you select a context length that fits in your GPU though? I suppose even a 128k model would be more than enough for almost everyone running these models on their own hardware.
1 comments

No you can't right now. Hopefully they will add this to ollama.
256k (actually 262k) is also up on HF: https://huggingface.co/gradientai/Llama-3-8B-Instruct-262k