Y
Hacker News
new
|
ask
|
show
|
jobs
by
brrrrrm
774 days ago
At small input size, yes the MLP dominates compute. At large input attention matters more