Hacker News new | ask | show | jobs
Helix Parallelism: Sharding Strategies for Multi-Million-Token LLM Decoding (research.nvidia.com)
2 points by h6d_100c 344 days ago