Hacker News new | ask | show | jobs
by ein0p 472 days ago
What if you're doing bulk inference? The efficiency and throughput of bs=1 s=1 is truly abysmal.
1 comments

People want to talk to their computer, not service requests for a thousand users.