Y
Hacker News
new
|
ask
|
show
|
jobs
by
manmal
39 days ago
It wouldn’t be useful with your setup, probably 3-4 token per second.
1 comments
DeathArrow
39 days ago
Yep, maybe I can open a feature request if it makes sense technically.
link
zozbot234
39 days ago
Arguably it makes more sense technically to get the model support into llama.cpp, which provides many options for GPU+CPU split inference already.
link