Hacker News new | ask | show | jobs
by born-jre 114 days ago
i think this matters more for lower batch sizes (local llm and private enterprise deployment where there wont be big user at specific time for big batch size) going from mem Io bottleneck to compute.