|
|
|
|
|
by numberless
570 days ago
|
|
Not the author, but on a Pi 5 small LLMs such as tinyllama or qwen2.5:0.5b run at over 25 tokens per second (haven't tested the performance boost with it yet). It could be useful if you wanted a local assistant or an AI server for devices you don't want to run LLMs on |
|