Hacker News new | ask | show | jobs
by borzunov 1265 days ago
Theoretical best-case for RAM offloading is 5.5 sec/token, for SSD offloading - 22 sec/token. Implementations we've tested are not faster than 10 sec/token though. See details in our paper: https://arxiv.org/pdf/2209.01188.pdf