Hacker News new | ask | show | jobs
by kasmura 678 days ago
Yes, it is just a way of computing the self-attention in a distributed way