Hacker News new | ask | show | jobs
by alekandreev 726 days ago
This is mostly about inference speed, while maintaining long context performance.