Y
Hacker News
new
|
ask
|
show
|
jobs
by
brucethemoose2
1106 days ago
Have you tried the most recent cuda offload? A dev claims they are getting 26.2ms/token (38 tokens per second) on 13B with a 4080.