Hacker News new | ask | show | jobs
by Der_Einzige 472 days ago
No one who is using this for home use cares about anything except batch size 1 sequence size 1.
1 comments

What if you're doing bulk inference? The efficiency and throughput of bs=1 s=1 is truly abysmal.
People want to talk to their computer, not service requests for a thousand users.