| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by simonw 466 days ago
	Sadly, the hardest part of running local models with tools like Ollama appears to be longer context prompts. Models that respond really quickly to a short sentence prompt need vastly more RAM and CPU/GPU time for significantly longer inputs. I'm finding this really damages their utility for me.