Hacker News new | ask | show | jobs
by brucethemoose2 1106 days ago
Have you tried the most recent cuda offload? A dev claims they are getting 26.2ms/token (38 tokens per second) on 13B with a 4080.