Hacker News new | ask | show | jobs
by davidatbu 1457 days ago
I wouldn't rule out the fact that transformers are very amenable to parallel computation as the reason