Hacker News new | ask | show | jobs
by brrrrrm 774 days ago
At small input size, yes the MLP dominates compute. At large input attention matters more