Hacker News new | ask | show | jobs
by JohannaAlmeida 76 days ago
Full attention O(n²): 17.96s / 5.6 tok/s

HybridAttention O(n·W + n·D): 0.35s / 286.6 tok/s