Hacker News new | ask | show | jobs
Moe inference optimizations: 15% lower expert load by request reordering (blog.doubleword.ai)
3 points by mezark 25 days ago