Hacker News new | ask | show | jobs
by simonw 466 days ago
Sadly, the hardest part of running local models with tools like Ollama appears to be longer context prompts.

Models that respond really quickly to a short sentence prompt need vastly more RAM and CPU/GPU time for significantly longer inputs. I'm finding this really damages their utility for me.