Hacker News new | ask | show | jobs
by ranger_danger 98 days ago
with regular llama.cpp on a 3070ti I get 60tok/s TG with the 9B model, it's quite impressive.