|
|
|
|
|
by kasmura
678 days ago
|
|
I disagree. Ring attention and tree attention are so general that the core ideas are independent of details about modern GPUs. Maybe so for flash attention but not this. I also disagree because these algorithms are fundamentally about enabling long context by distributing across gpus and this would not be enabled by “moored law for gpu hardware” |
|