Hacker News new | ask | show | jobs
by kasmura 678 days ago
I disagree. Ring attention and tree attention are so general that the core ideas are independent of details about modern GPUs. Maybe so for flash attention but not this. I also disagree because these algorithms are fundamentally about enabling long context by distributing across gpus and this would not be enabled by “moored law for gpu hardware”