|
|
|
|
|
by simonw
466 days ago
|
|
Sadly, the hardest part of running local models with tools like Ollama appears to be longer context prompts. Models that respond really quickly to a short sentence prompt need vastly more RAM and CPU/GPU time for significantly longer inputs. I'm finding this really damages their utility for me. |
|