Hacker News new | ask | show | jobs
by pinstripes 1 day ago
Trying to scale text inference to 1 million tok/s on cheap hardware.