Y
Hacker News
new
|
ask
|
show
|
jobs
by
lumost
852 days ago
For small values of N, the linear terms of the transformer dominate. At the end of the day, a double layer of 764*2048 is still north of 3.1 MM flops/token/layer.