Y
Hacker News
new
|
ask
|
show
|
jobs
by
areddyyt
643 days ago
The non-linear layers, particularly the softmax(QK^T), will be crucial to getting ultra-low latency and high throughput. We're considering some custom silicon just for that portion of every transformer block