Hacker News new | ask | show | jobs
by numberless 570 days ago
Not the author, but on a Pi 5 small LLMs such as tinyllama or qwen2.5:0.5b run at over 25 tokens per second (haven't tested the performance boost with it yet). It could be useful if you wanted a local assistant or an AI server for devices you don't want to run LLMs on
2 comments

Just curious, what can a 0.5B model be used for?
I use mine mostly as a context-free code copilot. It's not perfect, but it knows how to write a template and for most tasks that's all I need.
Are there any vision models that will work on the Pi 5? Something similar to minicpm-v?