Hacker News new | ask | show | jobs
by novaomnidev 925 days ago
Why is this faster than running llama.cpp main directly? I’m getting 7 tokens/ sec with this. But 2 with llama.cpp by itself