| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by numberless 570 days ago
	Not the author, but on a Pi 5 small LLMs such as tinyllama or qwen2.5:0.5b run at over 25 tokens per second (haven't tested the performance boost with it yet). It could be useful if you wanted a local assistant or an AI server for devices you don't want to run LLMs on

2 comments

Just curious, what can a 0.5B model be used for?

I use mine mostly as a context-free code copilot. It's not perfect, but it knows how to write a template and for most tasks that's all I need.

Are there any vision models that will work on the Pi 5? Something similar to minicpm-v?